Replication Package

Authors

Gerardo Festa Giammaria Giordano Valeria Pontillo Max Di Penta Damian A. Tamburri Fabio Palomba

Abstract

Context: Python is increasingly becoming the lingua franca for developing Machine Learning (ML) systems,thanks to a rich ecosystem of libraries and an emphasis on readability. In this context, Pythonic idioms are seen as stylistic conventions that support maintainable and efficient code. Conversely, Refactorable-Pythonicidioms refer to patterns that can be refactored into more idiomatic Python, improving code quality in terms of maintainability, performance, and clarity. Objective: While the assumptions about idiomaticity are widely accepted in practice, the extent to which Pythonic or Refactorable-Pythonic idioms relate to software quality in ML projects has not been systematically validated. To address this lack of empirical evidence, this paper conducts a large-scale study to assess how Pythonic and Refactorable-Pythonic idioms are related to code quality in ML systems. Method: We analyze 303 open-source Python projects from the NICHE dataset, distinguishing between “well-engineered” (i.e., projects that adopt structured development practices such as testing, CI, documentation, and packaging) and “non-engineered” (i.e., projects that lack such characteristics). Our analysis proceeds in two main phases: (i) idiom detection, where we extract Pythonic and Refactorable-Pythonic code patterns using a combination of existing and custom detectors; and (ii) quality assessment, where we detect Python-specific smells and relate them to code metrics and other quality indicators. Result: Truth Value Test and Assign Multiple Targets are the most common Pythonic and Refactorable-Pythonic idioms, respectively. In “well-engineered” projects, both idiom types positively correlate with Python-specific code smells, suggesting that idiomatic usage does not always align with higher code quality. In contrast, in “non-engineered” projects, the presence of smells is more strongly influenced by structural factors such as the number of lines of code, complexity, and commit activity. Conclusion: We conclude by distilling lessons learned, implications, and future research directions.

📄

Research Paper

Download Research Paper

💾

Dataset & Code

Download the complete dataset and replication code for reproducible research