How open source changed everything – again

The biggest open source innovations of the decade, from Git and Docker to data science and the cloud

Pro

Image: Open Source

28 November 2019

We are about to conclude another decade of open source, and what a long, strange trip it has been. Reading back through predictions made in 2009, no one had the foggiest clue that GitHub would change software development forever (and for everyone), or that Microsoft would go from open source pariah to the world’s largest contributor, or a host of other dramatic changes that became the new normal during a decade that was anything but normal.

These are some of the most significant open source innovations that got us to where we are now.

A cloudy future

Open source was making headlines prior to 2010, of course, but much of the open source news back then was “free software” vs. “open source” religious wars and lawsuits against Linux. To run open source software, you were still calling IT to provision servers (or using a spare that just happened to be sitting under your desk). The cloud changed all that. Suddenly developers did not need to get a hall pass from IT to run their open source code. Just as open source freed developers from Purchasing/Legal approval, so too did the cloud shake developers free of the friction inherent in hardware.

The cloud, however, was just the enabler. As Corey Quinn highlights, the infrastructure has become “open source,” though not because the clouds themselves are available under an open source license: “It runs on clouds, but I can grab a Terraform plan or a Serverless config from GitHub and have a thing up and running to test it out almost instantly.” Open source licensing and swipe-and-go access to cloud hardware have opened up developer productivity in ways that might have been faintly visible in early 2010 (AWS was started in 2006, after all) but were not realised until well into the decade.

It’s Git all the way down

“The biggest thing that happened to open source in the last decade is the introduction by GitHub of the pull request,” declares Tobie Langel. Enabled by cloud, he continues, “GitHub gave open source visibility and lowered the playing field for collaboration by an order of magnitude.” That collaboration was always the heart of the open source promise, but it was not until GitHub unlocked the social aspect of coding that it became real.

As Michael Uzquiano argues: “We had version control before, but GitHub/Lab really made it easy for anyone to fork code, try things, and contribute ideas back. Comments, issues, approval—it really delivered on the promise of code being open.” Git was not born in in the last decade, but like cloud, it did not really boom until the 2010s.

Docker and the container revolution

Like version control and Git, containers were not newly minted post-2010. In fact, the idea for containers first emerged back in 1979 with chroot (though seeds were planted even earlier). But it was Docker, as Steven Vaughan-Nichols asserts, that really made containers come alive: “Docker, or to be more precise… Docker tech transformed containers from an obscure technology to the mainstay of how software is consumed today. It’s changed Everything.”

Everything? Well, yes, at least for enterprise application development, and not because it is some cool new way to think about virtualisation. As Gordon Haff explains, “pre-Docker/Kubernetes containers were just another partitioning technique.” The real magic started when Docker nailed the developer experience, and from there, he goes on, “things snowballed,” leading to complete reinvention of the CI/CD pipeline and more. A decade ago, no one had heard of Docker and Kubernetes. Last month, more than 13,000 people showed up at KubeCon 2019 to explore this modern application world that Docker helped to create.

Data science becomes mainstream

Big data has always been mostly a dream for “the rest of us” (i.e., companies not Google), and prior to 2010 we saw earnest efforts to make it real. We have had data marts since the 1970s, Business Intelligence a bit later, and even saw Roger Magoulas coin the term “big data” in 2005. But none of these really anticipated just how big that data could be, and just how critical data scientists and data engineers would become, until well into the decade that saw Hadoop (created in 2008) come into its own, and then quickly become supplanted by a wave of NoSQL databases and other open source infrastructure.

Today, the infrastructure used to store and stream large volumes of data is mostly open source. Whether modern databases like MongoDB that make it easier to work with unstructured data, or the various tools like Apache Kafka that are used to move data around, open source made modern data science possible, and nearly all of that happened in the last 10 years.

In addition, the tools we use to analyse data are increasingly open source. In fact, much of the tooling (like TensorFlow) has been open source from day one, rather than a Hail Mary! pass from a proprietary vendor trying to resurrect the fading fortunes of their once-beloved data analytics tool. Numpy and Scikit-learn have become “ubiquitous,” notes Python expert Matt Harrison. Neither existed in 2010. Open source has been central to making this world grow, as Jacob Redding suggests: “Data science wouldn’t be as large as it is without Pandas, Scikit, Jupyter, and the entire world of R.” All of them open source, of course, lowering the bar for anyone wanting to try out the new tech.

Open source programming languages

Remember when programming languages were closed? The past decade seems to have shut the door on that era forever, with even Apple eventually succumbing to the open source movement and releasing Swift as open source. At the same time, a host of JavaScript frameworks (Node.js, Angular, React, Vue, etc.) came (and sometimes went) in a furiously fecund period of language and framework innovation. Indeed, alongside this world of JavaScript frameworks (becoming bigger than the browsers they were once meant to build for, as Alberto Ruiz suggests), we also saw a bevy of new, lower-level languages arise, like Go, Rust, and WebAssembly.

While Java’s “continued march” toward open source started before 2010, Rich Sharples notes, the past decade saw that push gain momentum, with the OpenJDK implementation of Java continuing to breathe new life into the language. Looking to the future, however, the looming Supreme Court review of Google vs. Oracle, Danese Cooper points out, will have monumental repercussions, as big as the threat of SCO back in its day (on the future of Linux). At stake is the copyrightability of APIs, of course, but also the future of Java. Stay tuned.

Curveball of the decade

Interestingly, Microsoft is involved in nearly all of these areas, yet Microsoft started the decade under Steve Ballmer, still fighting the not-so-good fight against open source. Fast forward to 2020, however, and “the Microsoft mutation from being the most fierce anti-open source advocate (“open source is a cancer!”) to being one of [its] biggest contributors” represents a massive change, as Benoit Jacquemont says. Today many developers use open source Visual Studio Code as their code editor, open source TypeScript to build web applications, and GitHub to store their code. Microsoft owns each.

Which may be a good place to end. So much has happened in this decade (and we have not even talked about how open source Android powers much of mobile, or the impact of Let’s Encrypt on the certificate authority industry, or how open source is at the cutting edge of machine learning, etc.), and yet we still have so far to go. Microsoft’s metamorphosis reminds us that organisations and industries can change, and that it is in our self-interest to embrace open source. Given how unpredictable this past decade was, the next 10 years is impossible to forecast, except to say that it is bound to include a lot more open source.

IDG News Services

Read More: cloud Data Science Docker Git open source