GitHub expands open source archive effort into three key libraries
Archive Program aims to preserve pieces of open source software to allow future software developers to see how the community built and reviewed code
20 November 2020 | 0
Historians and future generations of developers will be able to unearth early lines of open source Linux, Ruby, or Python code buried 250 feet under the earth’s permafrost layer and, now, in three historic libraries in Oxford, Egypt, and California, thanks to GitHub’s expanding Archive Program.
Announced last year at the code management company’s Universe event, the GitHub Archive Program aims to preserve open source software in much the same way we do works of art, design, or literature. By printing historically relevant open source repositories onto reels of piqlFilm (digital photosensitive archival film), GitHub – which was acquired by Microsoft in 2018—hopes to preserve the open source software movement for future generations.
This program includes the storage of a code archive in the Arctic World Archive in Svalbard, Norway – just one mile away from the famous Global Seed Vault – by storing 186 reels of piqlFilm and 21TB of repository data in a decommissioned coal mine 250 meters deep in the permafrost this summer.
Run in partnership with the Long Now Foundation, the Internet Archive, the Software Heritage Foundation, Arctic World Archive, and Microsoft Research, the program looks to preserve both ‘warm’ and ‘cold’ versions of the code to ensure multiple copies and formats of the software are preserved, also known as the LOCKSS (Lots Of Copies Keeps Stuff Safe) approach by archivists .
Now, the project is expanding by donating reels of hardened microfilm to the 400-year-old Bodleian Library at Oxford University in England; the Bibliotheca Alexandrina in Egypt, and the Stanford Libraries in California; as well as storing a copy in the library at GitHub’s headquarters in San Francisco.
Preserving the GitHub stars
GitHub is preserving its most popular repositories by the number of ‘stars’ given by the community, including projects like Linux and Android and programming languages like Ruby and Go. The company is also preserving 5,000 repositories picked at random.
“The idea behind that is when you go back in history we want to preserve the work of individual developers, students, and small, lesser known developers and their open source projects,” Thomas Dohmke, VP of strategic programs at GitHub told InfoWorld.
By its very nature, open source software is not a static thing to be preserved, it is collaborative and always in flux. The intention is not to store copies that can be booted and run in the future, although that may be possible. Instead, the idea is to preserve a moment in time, where open source became the premier mode of software development, and chart the cultural significance of that movement.
“A platform like GitHub can paint a picture of a broad spread of the software developer community across the globe at a moment in time,” Richard Ovenden, the Bodley’s librarian and president of the Digital Preservation Coalition, told InfoWorld.
“We think it is worth preserving software and how people worked together across the world to contribute and review source code. There is something culturally there which is worth preserving,” GitHub’s Dohmke added.
The archive is being built for two types of people, according to Dohmke, “historians and future software developers curious about how software was developed during this era.”
Each donation is specially encased using a combination of 3D printing and AI-generated art by the engineer and artist Alex Maki-Jokela. You can read more about his work on Medium.
All archived code will also include technical guides to QR decoding, file formats, character encodings, and other critical metadata so that future developers can decode it. “Storage is not the same thing as preservation, you have to do other things,” Ovenden said.
IDG News Service
Is this an area of interest? Tailored training for IT Professionals
The Irish Computer Society provides members with the necessary qualifications, skills and training needed to succeed and excel within the profession.
Upcoming courses which may be of interest include:
- Certificate in Business Analysis – offers academic accreditation for business analysts through the use of proven business analysis techniques. Up to 100% funding available.
- European Certified Data Protection Officer (ECDPO) – This programme has been designed to equip Data Protection Officers with the necessary skills and competencies to meet and maintain all aspects of data protection compliance.
- CDPP – Certified Data Protection Practitioner – Be confident that your organisation’s policies and procedures are legally compliant with data protection legislation by completing Ireland’s first certified data protection practitioner programme.