Coverage you'll actually read — Klemens Lukaszczyk

Per-PR coverage comment posted on a GitHub pull request

Coverage is one of those tools I genuinely like. It tells me whether the business logic is actually tested - not “do we have tests”, but “do the tests walk through the branches that matter”.

I’m not an advocate for chasing 100% across the whole repo. Plenty of code isn’t worth the ceremony. But the important stuff - business logic, calculations, algorithms - I do want at 100%, because that’s where every if, every condition, every edge case can silently be wrong. There I treat tests as documentation: this is how the code is supposed to behave under these circumstances.

Two things kept getting in the way, though.

Problem 1: coverage output is too chatty to be useful

On a large project - thousands of modules - a coverage run prints a wall of percentages. I changed three files. I care about three files. Instead I get a screen I scroll past, and in practice I just… stop checking. The signal I want is buried under everything I didn’t touch.

Problem 2: coverage is fine until it isn’t

Nobody watches coverage day to day. It quietly drifts down a little each week, and nobody notices - until one PR finally trips the CI threshold and fails. Now the person who happened to push that PR is on the hook for weeks of accumulated slippage that wasn’t theirs. The slow decline was invisible the whole way down.

Both problems have the same root cause: there’s no cheap, per-PR feedback that says did this change make coverage better or worse, on the code this change touches.

The fix: a baseline branch + a per-module diff in the PR

I solved this in al_check PR #3. Two pieces:

1. Keep the main-branch coverage somewhere stable. Every time CI runs on main, it commits the coverage numbers to a dedicated coverage_do_not_delete branch. That branch is the baseline - the last known-good coverage for every module. It never gets deleted, so there’s always something to compare a PR against.

2. On every PR, compare only the modules you changed against that baseline, and post it as a comment. No threshold gymnastics, no scrolling. The PR comment looks like this:

Test Coverage Summary Statistics - SUCCESS

Coverage: 87.5: better for 1.2% Congratulations! Good job!

Modules related to this PR - coverage

Module Baseline Current Delta
Check.Summary 80.00% 100.00% +20.00% ✅
Check.PrComment N/A 95.00% 🆕 new

Module	Baseline	Current	Delta
`Check.Summary`	80.00%	100.00%	+20.00% ✅
`Check.PrComment`	N/A	95.00%	🆕 new

You see the overall direction at the top (better / worse / no change, with the delta), and below it a table scoped strictly to the modules in this PR - with each one’s baseline, current, and delta. New modules are flagged 🆕. That’s the whole point: feedback on your code, and a trend you can’t accidentally ignore.

You can have the same per-module coverage for your changed files with one command locally with the check tool:

check showing coverage of changed modules locally

How to set CI up yourself

The whole thing is a CI job plus two small scripts. Rather than paste it all here, I’ll walk the flow and link each step in the ci.yml.

0. Create the baseline branch once. Make an orphan branch that only ever holds coverage artifacts:

git checkout --orphan coverage_do_not_delete
git rm -rf .
echo '{}' > best_coverage.json
git add best_coverage.json && git commit -m "init coverage baseline"
git push origin coverage_do_not_delete

You’ll also need a GH_TOKEN secret with contents: write so the job can push back to that branch.

The rest lives in the test job:

Pull the baseline first. The job checks out coverage_do_not_delete, reads the previous per-module numbers, and stashes them in /tmp before checking out the PR code — then re-checks out with full history so it can diff against main. → ci.yml#L84-L98
Run the tests, then emit a coverage report. Tests run through the check escript in partitions (check --only test --partitions 3), and a second call (check --full-coverage-output) writes the full per-module coverage table to a file. → ci.yml#L143-L147
Parse the report into per-module JSON. A ~20-line script scrapes that coverage table into {module => percentage}. → parse_coverage.py
Find which modules the PR touched, then diff them against the baseline. The job runs git diff --name-only origin/main...HEAD on lib/**/*.ex, greps the defmodule names out of those files, and feeds them to a script that renders the markdown table you saw above — baseline vs. current, only for the changed modules. → ci.yml#L177-L201 and module_coverage_diff.py
Post (or update) the PR comment. peter-evans/find-comment locates an existing comment so reruns edit it in place instead of spamming the thread; create-or-update-comment writes the body. → ci.yml#L203-L225
Save the new baseline — but only on main. After a main run, the job writes the fresh numbers back to coverage_do_not_delete, so the next PR compares against an up-to-date baseline. → ci.yml#L241-L256

Why this works

You only read what you changed. The comment is scoped to your PR’s modules, so checking coverage is a glance, not a chore.
The trend can’t hide. Every PR shows a delta against main. A slow leak shows up as a string of small Xs instead of one surprise CI failure months later.
No hard threshold to game. It’s directional feedback, not a gate that punishes whoever happens to cross the line.

The Elixir specifics (the check escript, the defmodule grep) are easy to swap for whatever your stack emits - the shape is the same: parse current → diff modified modules vs. a committed baseline → comment.

Cheers!