📊 REPLICATION PACKAGE

Pythonic vs Refactorable Pythonic: On the Relationship between Pythonic Idioms and Code Quality in Machine Learning Projects

Authors

Gerardo Festa
University of Salerno, Salerno, Italy
g.festa22@studenti.unisa.it
Giammaria Giordano
University of Salerno, Salerno, Italy
ggiordano@unisa.it
Valeria Pontillo
Gran Sasso Science Institute (GSSI), L’Aquila, Italy
valeria.pontillo@gssi.it
Max Di Penta
Univerity of Sannio, Benevento, Italy
dipenta@unisannio.it
Damian A. Tamburri
Univerity of Sannio, Benevento, Italy
datamburri@unisannio.it
Fabio Palomba
University of Salerno, Salerno, Italy
fpalomba@unisa.it

Abstract

Context: Python is increasingly becoming the lingua franca for developing Machine Learning (ML) systems,thanks to a rich ecosystem of libraries and an emphasis on readability. In this context, Pythonic idioms are seen as stylistic conventions that support maintainable and efficient code. Conversely, Refactorable-Pythonicidioms refer to patterns that can be refactored into more idiomatic Python, improving code quality in terms of maintainability, performance, and clarity. Objective: While the assumptions about idiomaticity are widely accepted in practice, the extent to which Pythonic or Refactorable-Pythonic idioms relate to software quality in ML projects has not been systematically validated. To address this lack of empirical evidence, this paper conducts a large-scale study to assess how Pythonic and Refactorable-Pythonic idioms are related to code quality in ML systems. Method: We analyze 303 open-source Python projects from the NICHE dataset, distinguishing between “well-engineered” (i.e., projects that adopt structured development practices such as testing, CI, documentation, and packaging) and “non-engineered” (i.e., projects that lack such characteristics). Our analysis proceeds in two main phases: (i) idiom detection, where we extract Pythonic and Refactorable-Pythonic code patterns using a combination of existing and custom detectors; and (ii) quality assessment, where we detect Python-specific smells and relate them to code metrics and other quality indicators. Result: Truth Value Test and Assign Multiple Targets are the most common Pythonic and Refactorable-Pythonic idioms, respectively. In “well-engineered” projects, both idiom types positively correlate with Python-specific code smells, suggesting that idiomatic usage does not always align with higher code quality. In contrast, in “non-engineered” projects, the presence of smells is more strongly influenced by structural factors such as the number of lines of code, complexity, and commit activity. Conclusion: We conclude by distilling lessons learned, implications, and future research directions.