Contributing members of the open source project git deployed a code change in June 2022 that switched the default file compression method from the gzip program to an internal gzip-compatible implementation. The change was made for performance reasons and to reduce the dependency on the aging gzip project. Unfortunately, it also impacted SaaS offerings like GitHub that use git under the hood. GitHub deployed the change and was also forced to quickly roll it back in January 2023. The scenario highlights the prominence of open source software, nestedness of supply chains, and impacts to secure delivery. If you’re a poet, For Want of a Nail is spot on here. In the simplest language, small things can have very big impacts.
What is git and how does it relate to GitHub?
While GitHub may be a well-known quantity as a popular developer tool, git itself, or the reasons for using it, are not as familiar outside of practitioner circles. Git is used commonly as a version control system and source code repository. You can store and organize all manner of files, particularly text, and maintain versioning more readily. For these reasons, git excels as a source code repository for modern application and systems development. Git is also repurposed by a number of service providers, including AWS CodeCommit, Azure Repos, GitHub, and GitLab.
What happened?
This change broke a critical aspect of software supply chain security, specifically integrity checking. The technical problem that arose is that file hashes or checksums that were generated by the compression algorithm also in turn changed. This breaks any integrity checking mechanisms that rely on hash comparisons to verify that a malicious party isn’t inserting unvetted or unexpected components into software. It’s a common approach to mitigating some of the malware threat. Integrity checks are also critical to a number of IT processes, such as version control, infrastructure automation, secure continuous delivery, software updates, patches, and operating system updates.
Checksums should remain consistent unless someone modifies original source code or files. This behavior is by design of hashing algorithms, and it’s foundational to ensuring integrity or authenticity of code. Any tool that uses the checksums for integrity verification would have to account for the updated hashes that resulted from the gzip change. The community seems to be split on the fallout. Almost six months passed from the original code change with radio silence. Comments started pouring in with objections after GitHub deployed to newer git code, however, which can be seen in the history for the git commit.
Why should we care and why now?
Headaches result with agile methodologies and DevOps practices since git-based version control and workflow are part of the puzzle. More specifically, version control, continuous integration, and continuous delivery are impacted. The change impacts how you verify integrity of infrastructure-as-code, policy-as-code, container images, source code, and more.
The GitHub gzip issue is a reminder that commercial service offerings still use open source underpinnings, and open source is integral to modern software supply chains. Some engineers deploy and maintain instances of git code repositories themselves, but everyone participating in version control uses git to interact with git-based services like GitHub.
There are significant negative impacts of changes like this. It can inhibit stability of build, delivery, and release processes for organizations. Everything that moves through CI/CD build pipelines can be called into question or be considered risky. The event comes at a time when many organizations have serious concerns about software supply chain risk.
How does this keep happening?
Who’s accountable in the case of this GitHub gzip issue? Contributors to the git open source project made the change June 2022. GitHub made the change much later in January 2023, but seems to have not fully considered the impacts.
In reality, open source projects don’t typically have the resources to communicate and coordinate changes beyond creation of change logs and documentation. Even large projects may only have a small number of dedicated full-time developers. More commonly, developers contribute as they have bandwidth off-hours. Or if they work for an organization that is a proponent of open source, they may be dedicating part of their time to a given project. Consumers of open source are on the hook for understanding code changes and any potential impacts.
These are fundamental differences between open source projects and the business of being an independent software vendor or software publisher. Should vendors take a more active role in open source software development and maintenance? Is there shared responsibility between the open source community and commercial partners? These questions need to be addressed as part of the software supply chain security dilemma. Efforts like the NSA’s Enduring Security Framework (ESF) and CISA’s Information and Communications Technology Supply Chain Risk Management (ICT SCRM) help to further the conversation.
What aspirin should IT and security leaders seek out?
Consumers will depend on the behavior of your systems, and organizations need to invest in validations of expected behaviors as well as careful evaluation of upstream changes. This scenario could also have played out with entirely internal or closed-source dependencies.
If a provider like GitHub deploys the git code change that swaps the compression method, it will again impact archive checksums. Organizations should diligently review change logs and documentation. Let’s face it, though; very few ever “read the manual.” This practice isn’t always operational reality since changes are too frequent, or there aren’t enough resources to review everything. The problem quickly spirals into a discussion around continuous validation of software bill of materials (SBOM), but not all the necessary technology pieces exist yet throughout a software supply chain. Steps you can take to prepare include:
- Verify what services or tooling are dependent on fetching source from git repositories or compare against git-computed checksums.
- Include software version update and signature update mechanisms in your inventory since they can also be impacted.
- Review or draft procedures on how to update checksum comparisons in case a git provider modifies the compression method again so that integrity checks don’t fail inappropriately.
What do we do next?
Two great benefits of open source projects include visibility of source code and transparency of change history. You can freely monitor the git project and commit history. You can also follow along with the public discourse. Sysdig remains committed to open source software. Falco and Sysdig OSS are offered as open source that’s maintained by Sysdig as well as the open source community. Sysdig also uses or contributes back to other open source projects that include Open Policy Agent (OPA), Prometheus, and eBPF.
Understanding the GitHub gzip issue is fundamental to secure continuous delivery and software supply chain integrity. You should stay plugged-in if you care about supply chain risk. Review the ESG and IT SCRM guidance. Consider how it can be adapted for your cybersecurity program. Federal entities and organizations that provide service to federal entities are required to adapt the relevant practices quickly. Other organizations will likely follow, particularly those with mature cybersecurity strategies.