Analysing GitHub Pull Request review times with SQLite and Go

In Improving Team Efficiency By Measuring and Improving Code Review Cycle Time, I mentioned that one thing we can do to understand if code review is causing delays is to measure it.

Since then, I've also worked on building this at Deliveroo with one of my colleagues, just before we started using PluralSight Flow, which didn't quite give some of the metrics we wanted out of it.

With a bit of free time in my time between jobs, I thought I'd at least blogument the data fetching and parsing that I've found works, and so if anyone else goes to do this, they've got something to start with.

Unlike previous attempts I wanted to:

  • build it as an Open Source project
  • write it with Go
  • use SQLite as the underlying datasource
  • try building a CLI using urfave/cli/

The project can be found at gitlab.com/tanna.dev/ghprstats, which has some docs on how to get started with it.

I'm not sure I'm quite happy with how I've internally implemented it, but happy it's done and I can iterate over it.

How it works

At its core, we are using the following four APIs from GitHub to retrieve the data about a given Pull Request:

These are then used to look at who is interacting with the changes, as well as using the timeline events to determine the state(s) the PR is in over time.

We fetch the data up-front and then sync it to the SQLite database, after which it can be queried or more easily distributed.

What it looks like

For a few PRs worth of data, running ghprstats report cycle-time results in the following (converted to HTML table for ease of viewing):

PRPR TitleChangesInitial State# Reviews# CommentsFirst CommenterTime to first commentFirst ReviewerTime to first reviewSecond CommenterTime to second commentSecond ReviewerTime to second reviewTime to closeMerged?PR link
snarfed/oauth-dropins#284Migrate IndieAuth to full authorization_code grant+82,-36 lines changed across 3 filesready213  snarfed4768    7150truehttps://github.com/snarfed/oauth-dropins/pull/284
cucumber/common#2024Add further examples for valid Cucumber files to pretty-print+108,-0 lines changed across 1 filesready06         falsehttps://github.com/cucumber/common/pull/2024
deepmap/oapi-codegen#648Generate anonymous objects referenced in schemas+1303,-50 lines changed across 35 filesdraft814         falsehttps://github.com/deepmap/oapi-codegen/pull/648
endoflife-date/endoflife.date#3330Remove Gorilla toolkit archiving+4,-3 lines changed across 1 filesready14  marcwrobel187    188truehttps://github.com/endoflife-date/endoflife.date/pull/3330

With SQL access to the raw data, it may also be easier to run queries like the below to better understand the data.

For instance, how can we work out the most frequent approvers?

select
  login,
  count(author_id) as num_reviews
from
  reviews
  inner join users on author_id = users.id
where state = 'APPROVED'
group by
  author_id
order by
  num_reviews desc
limit 10

Or how can we find out which PRs started as a draft vs ready for review?

select
  pulls.owner,
  pulls.repo,
  pulls.id,
  title,
  (
    case
      when i.initial_state is null then 'ready'
      else i.initial_state
    end
  ) as initial_state
from
  pulls
  left join (
    select
      (
        case
          when event = 'ready_for_review' then 'draft'
          else 'ready'
        end
      ) as initial_state,
      timeline.created_at,
      timeline.owner,
      timeline.repo,
      pull_id
    from
      timeline
    where
      (
        event = 'ready_for_review'
        or event = 'convert_to_draft'
      )
    order by
      created_at asc
    LIMIT
      1
  ) i on pulls.owner = i.owner
  and pulls.repo = i.repo
  and pulls.id = i.pull_id;

Written by Jamie Tanna's profile image Jamie Tanna on , and last updated on .

Content for this article is shared under the terms of the Creative Commons Attribution Non Commercial Share Alike 4.0 International, and code is shared under the Apache License 2.0.

#blogumentation #sqlite #go.

This post was filed under articles.

Interactions with this post

Interactions with this post

Below you can find the interactions that this page has had using WebMention.

Have you written a response to this post? Let me know the URL:

Do you not have a website set up with WebMention capabilities? You can use Comment Parade.