An extensive dataset of UML models in GitHub

G. Robles ; Truong Ho-Quang ; Regina Hebig ; Michel Chaudron ; M.A. Fernandez
IEEE International Working Conference on Mining Software Repositories (2160-1852). p. 519-522. (2017)
[Konferensbidrag, refereegranskat]

The Unified Modeling Language (UML) is widely taught in academia and has good acceptance in industry. However, there is not an ample dataset of UML diagrams publicly available. Our aim is to offer a dataset of UML files, together with meta-data of the software projects where the UML files belong to. Therefore, we have systematically mined over 12 million GitHub projects to find UML files in them. We present a semi-Automated approach to collect UML stored in images,.xmi, and.uml files. We offer a dataset with over 93,000 UML diagrams from over 24,000 projects in GitHub.

Nyckelord: dataset , GitHub , mining software repositories , modeling , UML

Denna post skapades 2017-09-11. Senast ändrad 2017-09-14.
CPL Pubid: 251818


