Getting started with Dependency Management Data
This post's featured URL for sharing metadata is https://www.jvt.me/img/profile.jpg.
This is a companion post to go alongside my talk writeup of my talk at DevOpsNotts July 2023 about the dependency-management-data (DMD) project
This is intended as a quick setup guide, rather than an exhaustive jump into what it is and how it works - if you'd like that, check out the talk writeup 👆
Want to know a bit more in-depth what it is and how it works? I've recently spoken at DevOpsNotts about it and I [have done a talk writeup] which digs into it a little further.
TL;DR extraordinaire
At a minimum, you need to:
- retrieve some data, for instance via renovate-graph
- note that you do not need to be already using Renovate to use this!
- create the SQLite database for dependency-management-data
- import the data
We can do this by running:
go install dmd.tanna.dev/cmd/dmd@latest
# produce some data that DMD can import, for instance via renovate-graph
npx @jamietanna/renovate-graph@latest --token $GITHUB_TOKEN your-org/repo another-org/repo
# or for GitLab
env RENOVATE_PLATFORM=gitlab npx @jamietanna/renovate-graph@latest --token $GITLAB_TOKEN your-org/repo another-org/nested/repo
# set up the database
dmd db init --db dmd.db
# import renovate-graph data
dmd import renovate --db dmd.db 'out/*.json'
# then you can start querying it
sqlite3 dmd.db 'select count(*) from renovate'
Retrieving the data
As noted above, we need to retrieve data to be imported into DMD. For dependencies, I'd recommend using renovate-graph, which uses Renovate as the engine for retrieving package data.
We can run the following:
# optional, allows renovate-graph to retrieve the `current_version` column, as well as populate the `renovate_updates` table
export RG_INCLUDE_UPDATES='true'
# produce some data that DMD can import, for instance via renovate-graph
npx @jamietanna/renovate-graph@latest --token $GITHUB_TOKEN jamietanna/jamietanna deepmap/oapi-codegen
# or for GitLab
env RENOVATE_PLATFORM=gitlab npx @jamietanna/renovate-graph@latest --token $GITLAB_TOKEN tanna.dev/serve jamietanna/tidied
If you are looking at AWS infrastructure, check out the README for endoflife-checker which explains in more details how to pull AWS data.
Creating the database and importing the data
Once renovate-graph
has executed, you'll see an out
directory with one file per repo.
First, we'll create the database:
# or any name, really
dmd db init --db dmd.db
Then, we need to import the data. Notice the quotes around the argument to avoid shell globbing
dmd import renovate --db dmd.db 'out/*.json'
Now our database is ready to go 👏
Generating advisories (optional)
This is an optional step, but allows us to get some more meaningful information about our dependencies.
We can run the following to set up our advisories:
# optionally fetch community-sourced custom advisories
dmd contrib download
# then generate advisories for all our packages
# note that this can take several minutes depending on how many dependencies you have!
dmd db generate advisories --db dmd.db
Running some queries
Now we've got the data available, we can ??.
It's recommended you find your SQLite browser of choice and try the following queries:
-- how many packages have been ingested via renovate-graph
select count(*) from renovate
-- how many pending package updates have been ingested via renovate-graph
select count(*) from renovate_updates
-- how many packages have been ingested via dependabot-graph
select count(*) from dependabot
-- what are your most popular 10 transitive Go dependencies?
select
distinct package_name,
count(*)
from
renovate,
json_each(dep_types) as dep_type
where
package_manager = 'gomod'
and dep_type.value = 'indirect'
group by
package_name
order by
count(*) DESC
limit 10;
And from the dmd
CLI, we can also run the following:
# if you've generated the advisories data
dmd report advisories --db dmd.db
dmd report mostPopularDockerImages --db dmd.db
dmd report mostPopularPackageManagers --db dmd.db
Example
Interested in seeing what it's like with some pre-baked data? The example project has a web app hosted on Fly.io that contains a lot of public repositories from GitHub and GitLab which can give you an idea based on some pre-seeded data.