What can we learn about the backdooring of xz/liblzma, using OpenSSF Security Scorecards and dependency-management-data?

Featured image for sharing metadata for article

CVE-2024-3094

This evening, it was announced by Andres Freund that there is backdoored code in xz and liblzma:

I accidentally found a security issue while benchmarking postgres changes.

If you run debian testing, unstable or some other more "bleeding edge" distribution, I strongly recommend upgrading ASAP.

https://www.openwall.com/lists/oss-security/2024/03/29/4"

This is absolutely a bad thing, and despite it being the long Easter weekend for a large amount of the world, I'm sure there will be a lot of folks looking into it.

This has been released under CVE-2024-3094, and is marked as Critical, the highest level of impact.

As well as the above linked email thread which is a great deal of depth into the issue, Xe Iaso has also written up some information about affected systems.

Now, I'm not here to talk about the vulnerability itself, but what we can learn about it.

There are unfortunately quite a few cases in recent years of backdoored code entering the supply chain - quite too many to link here!

It's been suggested on GitHub that this is due to (the lack of) requirement of code review on the libraries in question:

And so it begins. Always knew one day a nightmare supply chain attack would originate from GitHub.

"from github"? this wasn't a random drive-by contribution

From a GitHub repository where there are no branch protections, devs pushing to the default branches without reviews. Yes, not "from GitHub" in this case, but there are other OSS projects where someone can just exploit a build workflow and backdoor it.

This has allowed the developer who's committed the changes (whether compromised technically or physically) to act on their own and push the changes without anyone else in the loop.

So what can we learn about this, aside from to not necessarily update to the latest version of a library when it lands?

Catching unreviewed changes upstream, using OpenSSF Security Scorecard

You may be asking, "how many other libraries do I depend on that don't perform code review", hoping that the answer to that question is a low number... but you already know the answer to that question, don't you? πŸ˜…

To understand whether a given repository would also be susceptible to this, we can take advantage of the excellent OpenSSF (Security) Scorecards that can automagically provide us insight into the supply chain security health of our dependencies.

For instance, when we run Scorecard against the xz repo, we can see that the Code-Review check receives a value of 0 (the lowest possible) due to:

found 29 unreviewed changesets out of 30 -- score normalized to 0

Additionally, the Branch-Protection check has a score of 0:

branch protection not enabled on development/release branches Warn: branch protection not enabled for branch 'master'

This is super useful to get an indication of the health of the repository, and tracks with the suggested reason for this CVE.

But how can we make this a little easier to query, for instance across many dependencies?

Understanding just how many of your dependences are affected

Of course, it wouldn't be a blog post from me without being able to tie this back to dependency-management-data, a project I've been working on to better understand dependency usage across organisation(s).

With dependency-management-data there is a first-class integration with Scorecard, allowing you to import Scorecard data (or generate it from the public API's known data).

From here, we can then query the SQLite database, allowing us to craft a query such as:

-- a slightly more complex query to show the full range of the data
select
  s.platform,
  s.organisation,
  s.repo,
  s.package_name,
  s.version,
  s.current_version,
  package_type as package_manager,
  -- as SBOMs don't make this available, default to an empty array
  '[]' as dep_types,
  -- as SBOMs don't make this available, default to an empty string
  '' as package_file_path,
  printf('%.2f', scorecard_codereview) as scorecard_codereview
from
  sboms s
  inner join dependency_health as h on s.package_name = h.package_name
  and s.package_type = h.package_manager
where
  -- Scoring is leveled instead of proportional to make the check more
  -- predictable. If any bot-originated changes are unreviewed, 3 points are
  -- deducted. If any human changes are unreviewed, 7 points are deducted if a
  -- single change is unreviewed, and another 3 are deducted if multiple changes
  -- are unreviewed.
  -- Via https://github.com/ossf/scorecard/blob/c1066d9ac232e835ec0c22a255cdd46ec58dd2c7/docs/checks.md#code-review
  scorecard_codereview < 3
union
select
  r.platform,
  r.organisation,
  r.repo,
  r.package_name,
  r.version,
  r.current_version,
  r.package_manager,
  r.dep_types,
  r.package_file_path,
  printf('%.2f', scorecard_codereview) as scorecard_codereview
from
  renovate r
  inner join dependency_health as h on r.package_name = h.package_name
  and r.package_manager = h.package_manager
where
  -- Scoring is leveled instead of proportional to make the check more
  -- predictable. If any bot-originated changes are unreviewed, 3 points are
  -- deducted. If any human changes are unreviewed, 7 points are deducted if a
  -- single change is unreviewed, and another 3 are deducted if multiple changes
  -- are unreviewed.
  -- Via https://github.com/ossf/scorecard/blob/c1066d9ac232e835ec0c22a255cdd46ec58dd2c7/docs/checks.md#code-review
  scorecard_codereview < 3
order by
  scorecard_codereview desc;

We can see from the example data that ships with dependency-management-data that there are quite a few results πŸ˜…

Alternatively, we could look at Code-Review and Branch-Protection:

select
  s.platform,
  s.organisation,
  s.repo,
  s.package_name,
  package_type as package_manager,
  printf('%.2f', scorecard_codereview) as scorecard_codereview,
  printf('%.2f', scorecard_branchprotection) as scorecard_branchprotection
from
  sboms s
  inner join dependency_health as h on s.package_name = h.package_name
  and s.package_type = h.package_manager
where
  scorecard_codereview < 3 or scorecard_branchprotection < 10
union
select
  r.platform,
  r.organisation,
  r.repo,
  r.package_name,
  r.package_manager,
  printf('%.2f', scorecard_codereview) as scorecard_codereview,
  printf('%.2f', scorecard_branchprotection) as scorecard_branchprotection
from
  renovate r
  inner join dependency_health as h on r.package_name = h.package_name
  and r.package_manager = h.package_manager
where
  scorecard_codereview < 3 or scorecard_branchprotection < 10
order by
  scorecard_codereview, scorecard_branchprotection desc;

Which can be seen shown on the example data here.

Or we could look at the number of dependencies (in each repo) that are affected by low scores:

select
  s.platform,
  s.organisation,
  s.repo,
  count(*)
from
  sboms s
  inner join dependency_health as h on s.package_name = h.package_name
  and s.package_type = h.package_manager
where
  scorecard_codereview < 3
  or scorecard_branchprotection < 10
group by
  s.platform,
  s.organisation,
  s.repo
union
select
  r.platform,
  r.organisation,
  r.repo,
  count(*)
from
  renovate r
  inner join dependency_health as h on r.package_name = h.package_name
  and r.package_manager = h.package_manager
where
  scorecard_codereview < 3
  or scorecard_branchprotection < 10
group by
  r.platform,
  r.organisation,
  r.repo
order by
  count(*) desc

Which can be seen shown on the example data here.

(This will be skewed for repositories that have a high number of dependencies, such as those using npm - we could further break this down to include a percentage of dependencies affected)

These hopefully give you some good insights into the different ways you could utilise having this data to better understand the health of your dependencies, and make you a little more concerned about the state of everything πŸ”₯

Written by Jamie Tanna's profile image Jamie Tanna on , and last updated on .

Content for this article is shared under the terms of the Creative Commons Attribution Non Commercial Share Alike 4.0 International, and code is shared under the Apache License 2.0.

#dependency-management-data #security #open-source.

This post was filed under articles.

Interactions with this post

Interactions with this post

Below you can find the interactions that this page has had using WebMention.

Have you written a response to this post? Let me know the URL:

Do you not have a website set up with WebMention capabilities? You can use Comment Parade.