Using Ledger, plain text accounting and a touch of AI to fill in my UK tax return
This post's featured URL for sharing metadata is https://www.jvt.me/img/profile.jpg.
A few weeks ago, I was reading Siddhant Goel's post about 10 years using plain text accounting, and I found it a really interesting read, but decided that this level of tracking was unfortunately not for me (and my ADHD brain.
It wasn't until a few weeks later, where I was filling in my first UK self-assessment tax return, that I wondered if maybe this is something I should be doing, at least to manage the process more easily.
(I want to clarify that I was very sensible and did not start investing in the tooling until I'd done my first draft of the return, so I didn't spend precious time trying to perfect an unnecessary system!)
Although I could use Beancount as Siddhant did, I didn't particularly fancy having to use a Python-based tool, especially if I was going to need to write + maintain code to manage parsing/importing different file formats.
(I don't have a super strong dislike of Python, but after many years using it as my primary language, I found Ruby and fell in love with it instead, and I find it always a little jarring coming back to Python)
Although I could vibe-code integrations or importers, not really needing to touch the Python code, I didn't want to if I could avoid it.
I searched around, and found that there were a few alternatives, the most promising being Ledger.
Why plain text accounting?
The Plain Text Accounting model of tracking financial data in plain text is something I very much agree with. Especially where you want to avoid lock-in to a vendor, and have some control over the raw data yourself, storing that in a set of plain text file(s) is a great idea, and allows you to i.e. store them in source control and get handy diffs.
Ledger
When I found ledger, I also noticed there was a Haskell version, hledger, which seemed to have some good active development, but neither would be codebases I'd be able to contribute to without some help.
Although I was considering something like creating a lightweight SQLite database for this, it turns out I don't even need to - the ledger file format is lightweight, the ledger and hledger tools have query interfaces that LLMs are already trained on, and so I don't need to try and create anything special for it.
Additionally, it's a straightforward enough format that i.e. I could share it with an accountant more easily than a custom SQLite database.
Next, I needed to take CSV data from things like Stripe and Stripe Connect accounts or Hyperwallet and convert it to ledger entries.
Instead of doing it by hand, I decided to delegate it to Claude Sonnet 4.5, which was given the CSV headers (but not the raw data!) and created some scripts that built out the basis of my journal.
It took a few attempts to perfect the resulting entries, but it saved me having to do that work, and instead let me focus on reading through the documentation of the tax return process!
Fun fact: as I'd ended up completely vibe-coding these scripts, maybe I could've used Python in the end 😅
What does it look like?
I may be public with my salary but I don't think y'all need to see more than that 😅 So below I'll share some examples with some fake data, but using the taxonomy I've been using so far.
On the note of taxonomy, I've taken a fair bit of inspiration from gpt-4.1 and Claude Sonnet 4.5 for how to categorise each of the numbers, but I'm still not quite sure I'm happy with what I've got - I'll see where it goes.
I've got two sets of data that I need to track for my tax return:
- my full-time employment, and interest on savings accounts (if any)
- self-employment for my Open Source
For this, I have the following example data, for my own personal income:
# personal.journal
2024-04-04 Deliveroo P60
income:salary:deliveroo -£ 76543.21
expenses:tax:income:PAYE:deliveroo £ 10203.54
expenses:loan:student £ 2030.40
assets:bank:current £ 64309.27
2024-04-04 Deliveroo P11D
income:salary:benefits:private-medical:deliveroo -£ 123.45
equity:HMRC:taxable-benefits:deliveroo £ 123.45
2024-04-05 Bank-account-1 savings statement
income:savings-interest 123.45
income:bank-account-1 -123.45
And for my self-employed income:
# self-employment.journal
; Statement Descriptor: buymeacoffee
2023-06-16 One time support for Jamie Tanna
assets:bank:sponsorship 4.39 gbp
income:sponsorship:buymeacoffee -5.00 gbp
expenses:fees:stripe 0.61 gbp
; Statement Descriptor: JAMIE TANNA
2023-12-10 Invoice ABC123DE-0001
assets:bank:consulting:oapi-codegen 184.80 gbp
income:consulting:oapi-codegen:company-x -200.00 gbp
expenses:fees:stripe 15.20 gbp
; Statement Descriptor: Kofi Donation
2024-03-17 Ko-fi Donation
assets:bank:sponsorship 11.20 gbp
income:sponsorship:kofi -11.54 gbp
expenses:fees:stripe 0.34 gbp
2024-01-01 Stripe Connect py_abcdef
assets:bank:sponsorship 115.11 gbp
income:sponsorship:github-sponsors -115.11 gbp
expenses:fees:stripe 0.00 gbp
If you have any recommendations on improvements, reach out!
How does it work in practice?
Once I'd collated all the data from the various pieces of documentation, I could then start querying them to fill in each respective section of my tax return.
For instance, to indicate the relevant numbers for a given company:
# NOTE that this relies on the presence of the company name in every related line item
$ hledger balance deliveroo --period "2023-04-06 to 2024-04-05" -f personal.journal
£ 123.45 equity:HMRC:taxable-benefits:deliveroo
£ 10203.54 expenses:tax:income:PAYE:deliveroo
£ -123.45 income:salary:benefits:private-medical:deliveroo
£ -76543.21 income:salary:deliveroo
--------------------
£ -66339.67
And for my self-employment, I could look at my income via:
$ hledger income --period "2023-04-06 to 2024-04-05" -f self-employed.journal
Income Statement 2023-04-06..2024-04-04
|| 2023-04-06..2024-04-04
==========================================++========================
Revenues ||
------------------------------------------++------------------------
income:consulting:oapi-codegen:company-x || 200.00 gbp
income:sponsorship:buymeacoffee || 5.00 gbp
income:sponsorship:github-sponsors || 115.11 gbp
income:sponsorship:kofi || 11.54 gbp
------------------------------------------++------------------------
|| 331.65 gbp
==========================================++========================
Expenses ||
------------------------------------------++------------------------
expenses:fees:stripe || 16.15 gbp
------------------------------------------++------------------------
|| 16.15 gbp
==========================================++========================
Net: || 315.50 gbp
To look at interest on savings, I would search for:
hledger balance income:savings-interest --period "2023-04-06 to 2024-04-05" -f personal.journal
123.45 income:savings-interest
--------------------
123.45
To determine student loans:
% hledger balance expenses:loan:student --period "2023-04-06 to 2024-04-05" -f personal.journal
£ 2030.40 expenses:loan:student
--------------------
£ 2030.40
This is super handy to get the resulting data out, and allows me to see this sort of data over the years, too.
Looking ahead to next year
I'm looking forward to filling in my 2025-2026 tax return, so I can see how well this actually holds up.
As well as my scripts that take a Stripe (Connect) transactions export and converts it to the ledger format, I'm going to see if I can extract relevant information out of PDFs i.e. my P45/P60, or bank account statements.
Although it's not huge amounts of work to take the different documents and get the right information out of it, it may be useful to see if it can simplify a little bit of this process, and allow me to instead review the resulting entries.
Given some preliminary tests, gpt-oss:20b seems to cope fairly well with some limited prompting and some fairly straightforward PDFs, so I have hope for next year!
(Aside: I'd only ever want to use a local LLM with some of these documents - I wouldn't recommend letting a remote LLM have access to some of these things!)
Maybe I'll do a "lessons learned" post after I've done my 2025-2026 tax return 👀