Analysing GitHub Pull Request review times with SQLite and Go

In Improving Team Efficiency By Measuring and Improving Code Review Cycle Time, I mentioned that one thing we can do to understand if code review is causing delays is to measure it.

Since then, I've also worked on building this at Deliveroo with one of my colleagues, just before we started using PluralSight Flow, which didn't quite give some of the metrics we wanted out of it.

With a bit of free time in my time between jobs, I thought I'd at least blogument the data fetching and parsing that I've found works, and so if anyone else goes to do this, they've got something to start with.

Unlike previous attempts I wanted to:

  • build it as an Open Source project
  • write it with Go
  • use SQLite as the underlying datasource
  • try building a CLI using urfave/cli/

The project can be found at, which has some docs on how to get started with it.

I'm not sure I'm quite happy with how I've internally implemented it, but happy it's done and I can iterate over it.

How it works

At its core, we are using the following four APIs from GitHub to retrieve the data about a given Pull Request:

These are then used to look at who is interacting with the changes, as well as using the timeline events to determine the state(s) the PR is in over time.

We fetch the data up-front and then sync it to the SQLite database, after which it can be queried or more easily distributed.

What it looks like

For a few PRs worth of data, running ghprstats report cycle-time results in the following (converted to HTML table for ease of viewing):

PRPR TitleChangesInitial State# Reviews# CommentsFirst CommenterTime to first commentFirst ReviewerTime to first reviewSecond CommenterTime to second commentSecond ReviewerTime to second reviewTime to closeMerged?PR link
snarfed/oauth-dropins#284Migrate IndieAuth to full authorization_code grant+82,-36 lines changed across 3 filesready213  snarfed4768    7150true
cucumber/common#2024Add further examples for valid Cucumber files to pretty-print+108,-0 lines changed across 1 filesready06         false
deepmap/oapi-codegen#648Generate anonymous objects referenced in schemas+1303,-50 lines changed across 35 filesdraft814         false
endoflife-date/ Gorilla toolkit archiving+4,-3 lines changed across 1 filesready14  marcwrobel187    188true

With SQL access to the raw data, it may also be easier to run queries like the below to better understand the data.

For instance, how can we work out the most frequent approvers?

  count(author_id) as num_reviews
  inner join users on author_id =
where state = 'APPROVED'
group by
order by
  num_reviews desc
limit 10

Or how can we find out which PRs started as a draft vs ready for review?

      when i.initial_state is null then 'ready'
      else i.initial_state
  ) as initial_state
  left join (
          when event = 'ready_for_review' then 'draft'
          else 'ready'
      ) as initial_state,
        event = 'ready_for_review'
        or event = 'convert_to_draft'
    order by
      created_at asc
  ) i on pulls.owner = i.owner
  and pulls.repo = i.repo
  and = i.pull_id;

Written by Jamie Tanna's profile image Jamie Tanna on , and last updated on .

Content for this article is shared under the terms of the Creative Commons Attribution Non Commercial Share Alike 4.0 International, and code is shared under the Apache License 2.0.

#blogumentation #sqlite #go.

This post was filed under articles.

Interactions with this post

Interactions with this post

Below you can find the interactions that this page has had using WebMention.

Have you written a response to this post? Let me know the URL:

Do you not have a website set up with WebMention capabilities? You can use Comment Parade.