A security scanner detects 73 vulnerabilities. If all of the CVEs correspond to dormant code…how many security issues do you have to fix…0 or 73?
I’d argue that it’s in the middle — you now have 73 low priority issues to fix.
This post will reverse engineer the patching practices of Amazon and Google for Nodes under Kubernetes and show how they seem to agree with the practice of deprioritizing vulnerabilities in dormant code.
Following the same methodology can save hundreds of hours for your organization.
Using Real-time Context
EdgeBit‘s security platform uses real-time context alongside build-time and run-time SBOMs to prioritize vulnerabilities in your apps and infrastructure. Our customers gain a ruthlessly prioritized list of real issues to fix based on how dependencies are actually executing — Google and Amazon follow a similar methodology, as we’ll find out below.
This methodology can be applied to every base OS and every app workload, to focus engineers and avoid useless investigation.
Setting Up our Test
We recently announced expanded tracking of base OS vulnerabilities in the same manner used for application workloads. The intersection of 3 datasets is how we help customers run effective vulnerability management programs:
- Track dependency inventory and versions from build to build with SBOMs
- Continually match that inventory to security vulnerabilities
- Use real-time context to prioritize issues to fix
Since we’re 30 days past the feature launch, let’s take a look at some real world results from observing GKE and EKS nodes underneath test clusters to see how vulnerability management is done for these platforms. Our control will be a regular Amazon Linux machine.
Each cluster of machines in this test is set to drop CVEs related to dormant dependencies down 2 severity levels. For example, a Critical drops to a Medium or a High drops to a Low. EdgeBit customers can select from a number of policies based on their risk profile.
EdgeBit is able to reduce your compliance scope to the most accurate list of real security issues, saving time per sprint and removing developer frustration around security.
Vulnerabilities in Nodes underneath GKE
Our test Google Kubernetes Engine cluster runs on spot instances running the Container-optimized OS. These machines are recycled consistently which lets us track what vulnerabilities are fixed and more interestingly, which vulnerabilities remain from OS version to version.
Our test shows 100% of the 73 issues found in the GKE nodes were suppressed, which means that all CVEs are in code that is not executing.
Dependency Type | CVE Count | Status | Highest Severity |
---|---|---|---|
Python | 45 | 0 active | n/a |
45 dormant | 4 Medium CVEs | ||
Golang | 28 | 0 active | n/a |
28 dormant | 1 Medium CVE |
The issues found are a mix of Python and Golang. Interestingly, most of the issues have a “fix version” specified and have been seen by EdgeBit for the entire period, which makes it clear that these issues are specifically being allowed to remain, the dependent teams aren’t able to get their updates included in the OS, or the automatically rotated spot instances don’t include the patches.
Here’s a PyYAML issue that was orignially rated as Critical but suppressed by EdgeBit:
Vulnerabilities in Nodes underneath EKS
Our test Elastic Kubernetes Engine cluster runs on EKS-optimized Bottlerocket instances. These machines are continually updated which lets us track what vulnerabilities are fixed just like the GKE machines.
Our test shows 93% of 13 issues were suppressed and all were related to Golang packages. Similar to what we saw with EKS, all but one of the issues had a “fix version” specified but hadn’t been fixed yet.
Dependency Type | CVE Count | Status | Highest Severity |
---|---|---|---|
Golang | 13 | 0 active | n/a |
13 dormant | 2 Medium CVEs | ||
Java | 1 | 1 active | 1 High false positive |
0 dormant | n/a |
The single High is a false positive related to Amazon’s hotpatch-for-apache-log4j2
which seems to have incorrect metadata:
Vulnerabilities in Regular EC2 Machines
How much do minimal operating systems help with vulnerability management? A massive amount!
Compared to the results above, a machine updated monthly contains almost 600 security issues. EdgeBit is also able to suppress 49% of those, but there are a huge pile of important issues to fix.
In contrast to the minimal OSes, new security issues were being discovered throughout the window for this test. Nothing was spared: system RPMs, the kernel, containerd, OpenSSL, OpenSSH, NodeJS packages, Python dependencies and much more.
Dependency Type | CVE Count | Status | Highest Severity |
---|---|---|---|
RPMs | 408 | 254 active | 145 High |
154 dormant | 1 Medium | ||
Python | 134 | 25 active | 1 Critical |
109 dormant | 18 Medium | ||
Golang | 21 | 14 active | 1 Critical |
7 dormant | 7 Low | ||
Binaries | 10 | 10 active | 6 High |
0 dormant | n/a | ||
NodeJS | 3 | 3 active | 1 Critical |
0 dormant | n/a | ||
Java | 4 | 4 active | 1 High |
0 dormant | n/a |
On the positive side, 486 of the issues had a “fix version”, including many of the Critical and High CVEs. The good news is that a monthly patching schedule would likely pick up these fixes, but the bad news is that this test exposes how dangerous that methodology can be due to the sheer volume of investigation required.
As always, there is some Perl lurking around:
Ruthless Prioritization is Essential
Preventing useless investigation and reducing compliance scope can save every company time, money and frustration.
In the screenshots above, you may have seen an estimate of the time EdgeBit can save you. That’s based off this study about triaging and patching CVEs:
It takes more than 21 minutes for organizations to detect, prioritize and remediate a single vulnerability in production.
It gets worse in development, where teams need 16 minutes to catch a vulnerability, 23 minutes to prioritize and 12 minutes to remediate one vulnerability.
In development, it takes 51 minutes on average to investigate and fix a security vulnerability.
With greater context, engineers can fix the correct problems first, and deprioritize the rest. In a backlog of thousands of issues, it’s the only way to keep up.