Quantifying your reliance on Open Source software

Featured image for sharing metadata for article

Note this has been further updated in Quantifying your reliance on Open Source software (State of Open Con version).

This was originally a writeup of my talk at DevOpsNotts July 2023 about the dependency-management-data project. The talk abstract can be found on my talks site. It has since been updated ahead of TechMids 2023.

Why is this important?

As I wrote in the post Analysing our dependency trees to determine where we should send Open Source contributions for Hacktoberfest (CC-BY-SA-4.0):

In recent years, it has become unavoidable to build software on top of Open Source. This is absolutely a great thing, and allows developers to focus on fewer areas of domain specialisation as possible, as well as allowing a much wider range of users to pick up on defects and bring new features to our tools.

However, with events such as the Log4Shell security vulnerability, times where maintainers have removed their libraries from package and source repositories, sometimes in political protest, it's understandable that businesses are somewhat hesitant about the sustainability of projects.

Open Source projects need support, love and positive feedback from their communities, and with the increasing demands of organisations on their software supply chain, it's important to fully appreciate the depth of your dependencies.

Being able to understand how your business uses Open Source is really important for a few other key reasons (but this list is by no means exhaustive!):

  • Investigate areas you could raise Open Source contributions to
  • Investigate areas you could contribute financially to, for instance using projects like StackAid
  • Understand usages of libraries that aren't wanted (for instance if your organisation doesn't want to use any copy-left licenses)
  • Understand usage of libraries and frameworks i.e. to discover insight around what tooling to improve on
  • Understand the spread of (internal/external) libraries and their versions
  • Discovering what end-of-life or vulnerable software is in use
  • Discovering which libraries you're using which are deprecated, including internal libraries
  • Understand at a high level how many major/minor/patch versions behind you are from the releases upstream

That all sounds great, so how do we do that?

A fairly reasonable answer to that is "pay someone" like Mend, Snyk, GitHub or GitLab to provide security scanning and tooling on top of the existing repositories you already have.

But I'm not here to talk to you about that today, we're going to talk about doing it with Free and Open Source tooling.

whoami

I'm Jamie, I'm a Senior Engineer with an interest towards solving engineering-facing problems, aiming to make folks more effective in their roles, as well as being an avid blogger (on this website), and I've been thinking about the problem of understanding your Open Source dependency tree in this form as early as 2021, but also more generally since ~2019.

Timeline of events

  • 2023-10: This talk!
  • 2023-09: Ignite talk at DevOpsDays London
  • 2023-07: Talk at DevOpsNotts
  • 2023-02: Created the dependency-management-data project
  • 2022-08: First iteration with Dependabot
  • 2019: "Formally" considering it
  • 2017: Hacking around

What is dependency-management-data?

Dependency Management Data (DMD) is a set of Open Source tooling I've built from the ground up as a means to gain insights into your dependencies. It provides a means to look at the Open Source and proprietary dependencies that your organisation uses, producing an interface that allows further querying, filtering, and reporting.

DMD consists of:

  • The command-line tool dmd
  • The web application dmd-web
  • The outputted SQLite database
  • Your SQLite browser of choice

CLI (dmd)

The primary entrypoint for DMD is the command-line tool dmd.

The dmd CLI contains the functionality to build the SQLite database and optionally enrich it with things like advisories.

You can use this dependency data as-is, or you can use the command-line tool to enrich the database with additional data ("advisories"), such as being able to get insight into which dependencies are running end-of-life versions, as well as interrogate the database for specific data ("reports").

DMD takes data from a few different sources ("datasources") and converts them into an SQLite database. With both the original data discovered and the additional data provided, you can discover some pretty interesting things about your usage and answer all of the questions posed earlier in the talk, and more!

Web Application (dmd-web)

DMD also contains an inbuilt web server, dmd-web, which allows serving the database using a pre-configured integration with Datasette's excellent SQLite UI.

One of the great things about having this as a web UI is that you can share URLs to previously run queries, allowing you to easily collaborate with colleagues on the data without copy-pasting results, as well as giving you a central place for teams to access the data.

The example database can be found hosted on Fly.io.

The SQLite database

One key design decision for DMD was to utilise SQLite as the database engine. SQLite has recently seen a resurgence in popularity and for me was the perfect choice as I wanted to make it convenient to share the data between people, at least early on when I was manually updating the data and building the database.

With SQLite, there is a single file that can be shared around - for instance as part of the result of a GitHub Actions or GitLab pipeline - which would have performed any operations necessary to produce a "ready to use" dataset, and then allow folks to perform their own queries on top of it.

SQLite also works well whether you're working locally or hosting it elsewhere, as SQLite is a single-file database that can be distributed much more conveniently than other database engines.

Another key design decision was that the database should be the source of truth for all data and querying. Instead of locking you into using the dmd CLI to interact with the database, all data gets synced to the database, and can be browsed with any database browser.

Demo

Let's look at a quick demo.

We'll start by looking at Elastic's kibana project.

First we'll see what dependencies it has (example):

select
  *
from
  renovate
where
  repo = 'kibana'

Next, we can look at how many dependencies it has across the different package types (example):

select
  package_manager,
  count(*)
from
  renovate
where
  repo = 'kibana'
group by
  package_manager
order by
  count(*) desc

Finally, we can look at the advisories report (example) and notice that the Node dependency is coming up to end-of-life:

github 	elastic 	kibana 	node 	16.20.1 / 16.20.1 	["engines"] DEPRECATED 	nodejs 16 will be End-of-Life in 47 days
github 	elastic 	kibana 	node 	16.20.1 / 16.20.1 	["final"] 	DEPRECATED 	nodejs 16 will be End-of-Life in 47 days
github 	elastic 	kibana 	node 	16.20.1 / 16.20.1 	[] 			DEPRECATED 	nodejs 16 will be End-of-Life in 47 days

How did it come to be?

This project has been something that's been bubbling away in the back of my mind for a few years.

As written about in the post Idea for Open Source/Startup: monetising the supply chain I discussed how having access to dependency trees may be handy for a multitude of reasons, including financially supporting your supply chain:

While at Capital One, one of my colleagues was working on a side project to look at dependencies we were using, as a means to better understand our dependency trees, and lead to easier determining of when we needed to do dependency upgrades.

It'd got to a pretty great place, just as we'd started to adopt WhiteSource Renovate, so we were discussing other options for it, as it was now redundant for that original purpose.

Among other options raised, I suggested using it as a way to understand what libraries we were using, across our software estate, and use it to more appropriately distribute (financial) support to our projects.

Before this post, I'd worked on something similar at Capital One to gauge the usage and spread of libraries across repositories in my team or around our shared libraries community, which required awkward scripts of grep and sed to achieve the same, as there wasn't an easier way.

Fast forward a few months to from that post to Analysing our dependency trees to determine where we should send Open Source contributions for Hacktoberfest:

Coming up to Hacktoberfest [in 2022] - my first Hacktoberfest since joining Deliveroo - I wanted to spread the love and see if I could give a similar experience to other folks, as well as to try and get us to contribute to some of the projects that power the business.

A few months ago, I wrote about an idea on my personal blog about programmatically determining how (Open Source) libraries are used and, in that case, contributing financially, but the concept still works for contributing in other ways. I decided that I wanted to use the same dependency analysis approach, using the dependency tracking functionality we have available through GitHub Advanced security. Deliveroo is a data-driven company, so being able to bring some data to teams, to highlight commonly used libraries that may be good candidates for contributions, was really important.

As part of this, I had the opportunity to really dig into the data and find out how to use the data to determine our most used direct/transitive packages.

As we had recently got GitHub Advanced Security's Dependabot APIs enabled across Deliveroo, this gave me a great starting point for this data. Although Dependabot APIs only supported a subset of the languages and tools that we used, it supported much more than my hacky shell scripts could have in the past.

At the time, this was purely looking at the names of dependencies to understand the usage, but as time went on, I started using it more and more for understanding of our ecosystem.

This fed into some work in early January around our Production Engineering teams' need to understand the usage of DataDog client versions, and started off proving the value of this data being available.

This was a little awkward, hampered by the way that GitHub's Dependabot APIs were structured, as we were missing information about the current discovered version of the dependency. In most cases, GitHub's data would provide the version constraint specified in the Gemfile or go.mod, and would need further sanitisation to discover the exact version, or if you were lucky, a separate JSON object in the response may exist if there's a lockfile understood by Dependabot.

Update 2023-10-14 - as noted in Prefer using the GitHub Software Bill of Materials (SBOMs) API over the Dependency Graph GraphQL API, these issues mostly disappear, which is available since dependabot-graph v0.2.0.

As we were starting to use Renovate more, I discovered that Renovate had some pretty great data as well as supporting a much wider set of package ecosystems that we could use to our advantage. It wasn't immediately straightforward to get the dependency data out of Renovate, so I created a slim Open Source package called renovate-graph which would wrap around Renovate and allow outputting the full dependency tree in a JSON format. In hindsight, the "graph" is a bit of a misnomer, as it doesn't provide the full graph.

Using Renovate as the datasource for dependency data opened us up to more of the ecosystems we used like Scala Build Tool (sbt) and CircleCI, as well as including the exact version number a dependency was resolved to. With this available, I was able to start building some internal tooling for checking end-of-life details using endoflife.date, which provides an API to query the dates at which certain types of software becomes end-of-life, such as Node.JS, Go, Apache Tomcat, etc.

While doing this, I realised that my cobbled together database schema would probably be best to be thought about in a more structured way. Up until now, all the code was internal to Deliveroo, but I found that it didn't need to be, as this was a problem others could benefit from having a solution for, especially as I'd proved some value of this inside the org.

I decided to set about working on a clean-room implementation from the ground up which would make it more generic than Deliveroo's internal setup, and as it was an evenings and weekends project, it naturally fit in my personal organisation rather than my employer's.

How does it work?

DMD is first and foremost a command-line tool, dmd, which aims to pull dependency data from different datasources and construct an SQLite database for further processing.

To start using DMD, a user needs to have run three fairly straightforward commands - one to retrieve some data, and two to ingest it:

# produce some data that DMD can import, for instance via renovate-graph
npx @jamietanna/renovate-graph@latest --token $GITHUB_TOKEN your-org/repo
# set up the database
dmd db init --db dmd.db
# import renovate-graph data
dmd import renovate --db dmd.db 'out/*.json'
# optionally, generate advisories
dmd db generate advisories --db dmd.db
# then you can start querying it
sqlite3 dmd.db 'select count(*) from renovate'

Datasources

As mentioned above, DMD doesn't know how to get the dependency data, so it requires you provide data through the following tools:

DMD has an underlying data model that it translates each of the above datasources into, which is imported into the database schema.

From there, DMD then uses its own understanding of those data formats for performing reporting or enriching the data.

Reports

As well as having raw access to the data and being able to query it yourself, there are some common queries that folks may be interested in.

As of writing, there are several reports available:

$ dmd report --help

  advisories                 Report advisories that are available for packages or dependencies in use
  golangCILint               Query usages of golangci-lint, tracked as a source-based dependency
  mostPopularDockerImages    Query the most popular Docker namespaces and images in use
  mostPopularPackageManagers Query the most popular package managers in use

An example of these reports can be found on the example web app.

Some of these operate on the raw data, but some of them require pre-enriching the data with advisories data.

As we'll see in the case studies later, a few of these are based off the back of events happening in the Open Source ecosystems.

Advisories

Being able to query the dependency data for your projects is really powerful, and makes it possible to start answering questions like "what Terraform modules and versions are being used across the org" and "which teams are using the Gin web framework".

These questions are quite specific to your organisation to be able to make generic in the form of a report, but what if you wanted to ask questions like "which software am I running that needs an upgrade soon"?

This concept is know as "advisories", and it provides a means to surface other information about your dependencies, such as whether a dependency is deprecated/end-of-life or unmaintained, has a security issue, or is something else.

As mentioned before, to start with I found that it was useful to have end-of-life checking through endoflife.date, which gave us visibility over which of our libraries were running end-of-life versions. Over time, I've also added integrations with osv.dev for vulnerability data and deps.dev for vulnerability and licensing data.

This end-of-life checking doesn't just work for package data, but also includes AWS infrastructure checking through endoflife-checker, making it possible to answer questions like "how much time should my team(s) be planning in the next quarter to upgrade their AWS infrastructure".

These are useful, but sometimes you will want to be able to define your own rules or advisories, which can be done by creating custom advisories. I find this to be a particularly useful feature, as it allows you to really teach the tooling how to make it work best for your organisation.

To do this, you can add an advisory to the advisories table, which allows you to define your own rules about packages. This lends itself well to being able to define i.e. a security or maintenance issue with your own internal libraries, or flag up cases where you're using libraries you would prefer not to.

An example of what advisories data looks like can be found on the example web app. We'll also see an example of an advisory in a case study later.

Additionally, there are community-sourced advisories through the "contrib" project, which provides a means to share common advisories for the good of the community. For instance:

INSERT INTO custom_advisories (
  package_pattern,
  package_manager,
  version,
  version_match_strategy,
  advisory_type,
  description
) VALUES (
  'github.com/golang/mock',
  'gomod',
  NULL,
  NULL,
  'UNMAINTAINED',
  'golang/mock is no longer maintained, and active development been moved to github.com/uber/mock'
);

If there are any other additional sources you'd find useful for advisories, please contribute them! If you're unable to - for instance if it takes information from an internal database - then you could create a new table and provide a means to sync the data into it, so you can add it to custom queries.

Ownership

An additional opt-in feature is the ability to manage ownership for repositories, which can be really great for trying to work out who you need to get in touch with about an advisory.

For instance, let's say that we've found which of our projects are using a Go library that we're no longer recommending. How would we let the owners know that this is deprecated? Do we know who the owner even is?

DMD contains a dmd owners subcommand that allows us to manage the ownership through a separate table owners which allows JOINing in queries.

This could for instance be synced with some internal tooling for managing ownership of services and projects.

Once the ownership data is present, you can then perform a query such as:

select
  distinct
  renovate.platform,
  renovate.organisation,
  renovate.repo,
  owner
from
  renovate
  left join owners
  on  renovate.platform = owners.platform
  and renovate.organisation = owners.organisation
  and renovate.repo = owners.repo

This would allow you to see all repos and their respective ownership, and works well when performing other queries against this data.

Example project

Another key piece of functionality available in the DMD ecosystem is having a separate example project, which pulls from various real-world public repositories.

Although not a core part of the DMD project itself, it's an important offering to provide prospective users an idea of what the data could be used for, as well as being part of the integration tests that run as part of contributions to DMD, to ensure that there aren't any regressions introduced.

Contrib project

As mentioned before, there is the "contrib" project which provides a space to manage community-sourced contributions.

Right now, we only have support for advisories, but it's been set up in a way to be extensible and allow sharing other community-sourced data that doesn't make sense to sit in DMD's repo.

Case studies

To give more of an indication of some of the things that can be done with DMD, let's take a look at some practical applications of this tooling, based on areas this data has previously been used.

Which other services may be affected by this production bug?

I was recently supporting an incident which had the root cause being due to the version of an internal library being somewhat out-of-date. This library had an issue which would lead to degraded performance, but it was resolved in the last couple of releases. Because it was a highly-used library, we needed to determine exactly which other services were affected.

Without DMD, the result would have been to search in the GitHub UI, trying to craft very specific searches for YAML files that had several ways of defining this, which wasn't super straightforward. But as we did have DMD set up with all our data, we could craft a straightforward SQL query to highlight all the services and their associated teams.

For example, let's say that we want to find all cases where we've got the usage of the aws-lambda-go package, and associated team ownership information to more easily page engineers into the incident call.

We could look this up by crafting a query such as (example):

select
  distinct renovate.platform,
  renovate.organisation,
  renovate.repo,
  version,
  owner
from
  renovate
  left join owners on renovate.platform = owners.platform
  and renovate.organisation = owners.organisation
  and renovate.repo = owners.repo
where
  package_name = 'github.com/aws/aws-lambda-go'
order by
  version desc

NOTE: because the version / current_version fields are treated as strings, sorting or filtering may not quite follow what you'd expect. I'll look at documenting some common means to better handle the version numbers in the future.

Consider the impact of the Log4Shell security vulnerability and how different it could've been if you'd have had this data readily available?

The Gorilla Toolkit archiving

In the Go ecosystem, the Gorilla Toolkit was a heavily used set of libraries for building web applications and RESTful APIs. Unfortunately in December last year, the maintainers announced that they were archiving the project due to a continued lack of support from the community.

When the news broke, there was a lot of discussion around the Go community about whether it meant moving to different packages or looking at (a bit too late) forking and maintaining it.

Although most notably known for gorilla/mux, the HTTP router, there are a number of other packages like gorilla/csrf, gorilla/securecookie and gorilla/sessions that are much more risky when unmaintained, compared to the router which had not needed an update in over 2 years.

For projects and organisations that relied on the Gorilla Toolkit, understanding the impact was important.

Using the dependency-management-data project at Deliveroo, I was able to understand the impact, discover which repos relied on the different packages in the Gorilla toolkit, directly or indirectly, and could appropriately understand our impact.

For instance, the below report is based on the dependency-management-data-example project (using dmd v0.26.0):

$ dmd report gorillaToolkit --db dmd.db
Renovate
Direct dependencies
+------------------------------+---+
| PACKAGE                      | # |
+------------------------------+---+
| github.com/gorilla/mux       | 4 |
| github.com/gorilla/websocket | 1 |
| github.com/gorilla/handlers  | 1 |
+------------------------------+---+
Indirect dependencies
+---------------------------------+---+
| PACKAGE                         | # |
+---------------------------------+---+
| github.com/gorilla/websocket    | 6 |
| github.com/gorilla/securecookie | 2 |
| github.com/gorilla/sessions     | 1 |
| github.com/gorilla/schema       | 1 |
| github.com/gorilla/mux          | 1 |
| github.com/gorilla/context      | 1 |
+---------------------------------+---+

Using this high-level information, as well as being able to query the data further, we were able to understand the spread of HTTP routers in use across the org.

Note that as of writing, it has been unarchived again, but the point still stands for this being a great case study.

Docker free tier sunset

When Docker announced that Free Team organisations would be sunset earlier this year, there was a good deal of concern from the Software Engineering community, many of whom were unclear how much of an impact this would have on them.

This change had an impact to not only teams running their own Docker hub organisations (also known as "namespaces"), but pulling from other peoples' organisations and if a well-used organisation, such as goreleaser did not either apply for a Docker-Sponsored Open Source Project, or pay up, then any downstream users would be impacted by more frequent rate limits or even being unable to pull the images until the next billing cycle.

As mentioned in my blog post Working out which Docker namespaces and images you most depend on, using the dmd CLI, we can receive a report of the most used Docker namespaces and images in the organisation, which allows us to determine the impact of the change.

For instance, the below report indicates our most used namespaces and images:

$ dmd report mostPopularDockerImages --db dmd.db
Renovate
+---------------------------------------+-----+
| NAMESPACE                             |   # |
+---------------------------------------+-----+
| _                                     | 346 |
| dockersamples                         |  12 |
| registry1.dsop.io/ironbank/redhat/ubi |  11 |
| docker.elastic.co/elasticsearch       |   6 |
| gcr.io/distroless                     |   6 |
| cimg                                  |   5 |
| docker.elastic.co/kibana              |   4 |
| amazon                                |   4 |
| gcr.io/kaniko-project                 |   3 |
| goreleaser                            |   3 |
| quay.io/something                     |   3 |
| circleci                              |   3 |
+---------------------------------------+-----+
+--------------------------------------------+----+
| IMAGE                                      |  # |
+--------------------------------------------+----+
| golang                                     | 44 |
| alpine                                     | 36 |
| node                                       | 33 |
| docker                                     | 25 |
| nginx                                      | 21 |
| ubuntu                                     | 19 |
| python                                     | 16 |
| ruby                                       | 15 |
| redis                                      | 14 |
| busybox                                    | 11 |
| registry1.dsop.io/ironbank/redhat/ubi/ubi8 | 11 |
| openjdk                                    | 11 |
+--------------------------------------------+----+

Although Docker have since cancelled this change, understanding the usage of Docker images across your organisation can be super helpful.

For instance, if you have a policy to only allow internally-hosted Docker images, you may want to find cases where there are images that are being pulled from the public DockerHub. This could also be achieved using custom advisories to flag any Docker images that aren't hosted internally.

What are the most-used transitive dependencies

Although we most often think about our direct dependencies, there's a whole other set of transitive dependencies that are indirectly pulled in.

It can be very interesting to understand which dependencies you most depend on without realising, especially considering the xkcd titled Dependency:

xkcd comic showing a tower of various layers of boulders and stones, labelled "all modern digital infrastructure", which looks a little precarious. Towards the bottom there is a slim load-bearing stone which is labelled "a project some random person in Nebraska has been thanklessly maintaining since 2003"

For instance, with Go Modules (example query):

select
  distinct package_name,
  count(*)
from
  renovate,
  json_each(dep_types) as dep_type
where
  package_manager = 'gomod'
  and dep_type.value = 'indirect'
group by
  package_name
order by
  count(*) DESC;

Note that this can be a little more difficult based on the package manager in use, due to the way that the Renovate doesn't always surface the lockfile's data, which is something I'm looking at further improving.

How far behind on updates am I?

Whether you're using Renovate to manage your dependency updates or not, it can be super handy to have a breakdown of your pending software updates as a means to roughly gauge what updates are available.

For instance, to get a view of how many updates are pending, per package manager (example):

select
  package_manager,
  count(*)
from
  renovate_updates
group by
  package_manager
order by
  count(*) desc

Alternatively, how many updates (and whether they're i.e. major bumps) per package manager (example):

select
  package_manager,
  update_type,
  count(*)
from
  renovate_updates
group by
  package_manager,
  update_type
order by
  count(*) desc

Using custom advisories

Let's say that as an organisation you prefer teams don't use Spring (Boot) in their Java web applications as you prefer using Dropwizard.

You could look up this data just by querying where Spring (Boot) is used:

select
  *
from
  renovate
where
  datasource = 'maven'
  and package_name like 'org.springframework%'

However, we could instead wrap this in an advisory:

insert into
  custom_advisories (
    package_pattern,
    package_manager,
    version,
    version_match_strategy,
    advisory_type,
    description
  )
VALUES
  (
    'org.springframework*',
    'gradle',
    NULL,
    'ANY',
    'OTHER',
    'Spring (Boot) is not supported in our organisation. Please see https://internal.docs-site/...'
  );

Then, when running reports on your data, you'll see this alongside each reference to the Spring (Boot) dependencies as an advisory:

+----------+------------------------+---------------------------------------------+----------------------------------------------------+-----------------+------------------+---------------+----------------------------------------------------------------------------------------------+
| PLATFORM | ORGANISATION           | REPO                                        | PACKAGE                                            | VERSION         | DEPENDENCY TYPES | ADVISORY TYPE | DESCRIPTION                                                                                  |
+----------+------------------------+---------------------------------------------+----------------------------------------------------+-----------------+------------------+---------------+----------------------------------------------------------------------------------------------+
| github   | co-cddo                | federated-api-model                         | org.springframework.boot:spring-boot-gradle-plugin | 2.7.5 / 2.7.5   | ["dependencies"] | OTHER         | Spring (Boot) is not supported in our organisation. Please see https://internal.docs-site/...|
| github   | cucumber               | cucumber-jvm                                | org.springframework.boot:spring-boot-dependencies  | 3.0.5 / 3.0.5   | ["import"]       | OTHER         | Spring (Boot) is not supported in our organisation. Please see https://internal.docs-site/...|
| github   | cucumber               | cucumber-jvm                                | org.springframework.boot:spring-boot-maven-plugin  | 3.0.1 / 3.0.1   | ["build"]        | OTHER         | Spring (Boot) is not supported in our organisation. Please see https://internal.docs-site/...|
| github   | cucumber               | cucumber-jvm                                | org.springframework:spring-context-support         | 6.0.11 / 6.0.11 | ["provided"]     | OTHER         | Spring (Boot) is not supported in our organisation. Please see https://internal.docs-site/...|
| github   | cucumber               | cucumber-jvm                                | org.springframework:spring-test                    | 6.0.11 / 6.0.11 | ["provided"]     | OTHER         | Spring (Boot) is not supported in our organisation. Please see https://internal.docs-site/...|
| gitlab   | jamietanna             | spring-boot-http-client-integration-testing | org.springframework.boot                           | 2.6.0 / 2.6.0   | ["plugin"]       | OTHER         | Spring (Boot) is not supported in our organisation. Please see https://internal.docs-site/...|
| gitlab   | jamietanna             | spring-boot-onion-architecture-example      | org.springframework.boot                           | 2.6.2 / 2.6.2   | ["plugin"]       | OTHER         | Spring (Boot) is not supported in our organisation. Please see https://internal.docs-site/...|
| gitlab   | jamietanna             | spring-content-negotiator                   | org.springframework.boot:spring-boot-starter-test  | 2.6.3 / 2.6.3   | ["dependencies"] | OTHER         | Spring (Boot) is not supported in our organisation. Please see https://internal.docs-site/...|
| gitlab   | jamietanna             | spring-content-negotiator                   | org.springframework.boot:spring-boot-starter-web   | 2.6.3 / 2.6.3   | ["dependencies"] | OTHER         | Spring (Boot) is not supported in our organisation. Please see https://internal.docs-site/...|
| gitlab   | jamietanna             | starling-take-home-test                     | org.springframework.boot                           | 2.6.3 / 2.6.3   | ["plugin"]       | OTHER         | Spring (Boot) is not supported in our organisation. Please see https://internal.docs-site/...|
+----------+------------------------+---------------------------------------------+----------------------------------------------------+-----------------+------------------+---------------+----------------------------------------------------------------------------------------------+

An example of all findings for custom advisories can be found here.

Getting started

To get more of a feel for some real-world example data, you can see the example project which has powered the case studies and pulls data from various Open Source repositories across GitHub and GitLab.com.

There are also screencasts of various portions of functionality of the DMD tooling using the example project which can be found on the dmd website.

I'd recommend trying it with your own organisation's data and see what insights you can get - there will definitely be some interesting things in there!

You can do so using the following three-command setup to get started:

# produce some data that DMD can import, for instance via renovate-graph
npx @jamietanna/renovate-graph@latest --token $GITHUB_TOKEN your-org/repo another-org/repo
# or for GitLab
env RENOVATE_PLATFORM=gitlab npx @jamietanna/renovate-graph@latest --token $GITLAB_TOKEN your-org/repo another-org/nested/repo

# set up the database
dmd db init --db dmd.db
# import renovate-graph data
dmd import renovate --db dmd.db 'out/*.json'
# then you can start querying it
sqlite3 dmd.db 'select count(*) from renovate'

I have also written a companion post for this which is a concise getting started post that may be more convenient to share with colleagues.

What's next?

I've got a number of pending features including new datasources, new advisories sources, and improving the web application.

I'm also hoping to get some more folks using it and sharing their own use-cases and functionality they'd like to make this more effective.

I'm super passionate about this, and it's been arguably a bit of a game changer with the way I can approach problems as an engineer working on shared tooling, as well as at a team-level considering what work is required to do to close off advisories.

Written by Jamie Tanna's profile image Jamie Tanna on , and last updated on .

Content for this article is shared under the terms of the Creative Commons Attribution Non Commercial Share Alike 4.0 International, and code is shared under the Apache License 2.0.

#dependency-management-data #public-speaking #devops-notts #open-source #free-software.

This post was filed under articles.

Interactions with this post

Interactions with this post

Below you can find the interactions that this page has had using WebMention.

Have you written a response to this post? Let me know the URL:

Do you not have a website set up with WebMention capabilities? You can use Comment Parade.