Monday, March 17, 2008

Dirk Riehle: Total Growth of Open Source

Via Dana Blankenhorn's blog, I came across an excellent article, "The Total Growth of Open Source" from Amit Deshpande and Dirk Riehle from SAP Research. In it, they look at over 5,000 active and popular Open Source projects and concluded

"...that the total amount of source code as well as the total number of open source projects is growing at an exponential rate. Previous research showed linear and quadratic growth in lines of source code of individual open source projects. Our work shows that open source is expanding into new domains and applications at an exponential rate."

It's one thing to read that. It's quite another to actually see it in action (see graph above tracking lines of source code over time).

This is pretty heady stuff. One of my assumptions has been that Open Source, being a child of the internet, directly benefited from the sheer numbers of people who understood more about software development. My hypothesis was that, as more knowledge was distributed online, the growth in Open Source development would continue. The evidence would seem to corroborate that assumption.

Also interesting was the methodology of the study. As online tools grow ever deeper, the data at Riehle's disposal is richer than ever. In fact, they pulled their data from, using their data pulls from source code repositories to measure the additions and subtractions for Open Source projects. They used a measure of the number of incoming links to project home pages to determine the top projects to measure, and then tracked their growth over time.

One thing I would have liked to see and didn't - at least, not that I can tell - is how much of the growth was "organic" and how much was due to more projects springing up. It's great to know the total number of lines of code and the total number of projects. What we don't know is which of these projects are chiefly responsible for the growth, or what the average "health" rating is for each project. Even better still would be to divy up the projects into general categories based on growth in lines of code: would that give an accurate representation of a project's overall "health"?

Back in 2001, when it seemed that our world was imploding, I recall some folks wondering aloud whether Open Source contributions would stop. Judging from this study, at least, it seems pretty clear that the .com implosion had little impact on Open Source growth, if any at all.

No comments: