Identifying Unusual Commits on GitHub
Title | Identifying Unusual Commits on GitHub |
Publication Type | Journal Article |
Year of Publication | 2017 |
Authors | Raman Goyal, Gabriel Ferreira, Christian Kästner, James Herbsleb |
Journal | JOURNAL OF SOFTWARE: EVOLUTION AND PROCESS |
Keywords | software ecosystems; notification feeds; information overload; transparent environments; anomaly detection |
Abstract | Transparent environments and social-coding platforms as GitHub help developers to stay abreast of changes during the development and maintenance phase of a project. Especially, notification feeds can help developers to learn about relevant changes in other projects. Unfortunately, transparent environments can quickly overwhelm developers with too many notifications, such that they loose the important ones in a sea of noise. Complementing existing prioritization and filtering strategies based on binary compatibility and code ownership, we develop an anomaly-detection mechanism to identify unusual commits in a repository, that stand out with respect to other changes in the same repository or by the same developer. Among others, we detect exceptionally large commits, commits at unusual times, and commits touching rarely changed file types given the characteristics of a particular repository or developer. We automatically flag unusual commits on GitHub through a browser plugin. In an interactive survey with 173 active GitHub users, rating commits in a project of their interest, we found that, though our unusual score is only a weak predictor of whether developers want to be notified about a commit, information about unusual characteristics of a commit change how developers regard commits. Our anomaly-detection mechanism is a building block for scaling transparent environments. |
Citation Key | node-36439 |