CPAN vs Hackage releases, Part 1

Back in mid-November 2020, I noticed that Hackage (the Haskell package repository) probably has roughly the same daily upload rate as CPAN, or even higher.

Since the Hackage API does not provide a way to list releases (uploads), I had to download the recent additions page periodically, parse each page, and merge the results into a single large list. Because I have just collected the recent additions page since mid-November, I'm looking at December 2020 period.

% http-tiny-plugin-every --every 3h --dir . --trace
% for f in 2*.log; do parse-hackage-page "$f" --format ltsv > "$f.ltsv"; done
% combine-overlap 2*.ltsv > hackage_release_202012.ltsv

For CPAN, the MetaCPAN API lets us query various things in many ways so the simple task of listing recent releases is not a problem at all. I'm using a CLI to do this:

% list-metacpan-releases --from-date 2020-12-01 --to-date 2020-12-31 --json > cpan_release_202012.json

With this two pieces of data, I just need to perform some SQL (again, using CLI for this) to get what I want.

So for December 2020, there are 957 releases:

% fsql -a cpan_release_202012.json:t 'SELECT COUNT(*) FROM t' -f tsv

while for Hackage there are 629:

% fsql -a hackage_release_202012.ltsv:t 'SELECT COUNT(*) FROM t' -f tsv

As for number of authors who did releases in this period, the two are more similar:

% fsql -a cpan_release_202012.json:t 'SELECT COUNT(DISTINCT author) FROM t' -f tsv

while for Hackage there are 191:

% fsql -a hackage_release_202012.ltsv:t 'SELECT COUNT(DISTINCT author) FROM t' -f tsv

So this does confirm my guess that the upload activity for both repositories are currently in the same order of magnitude, but does not confirm the suspicion that Hackage is more active than CPAN, at least in December 2020. I plan to do a follow up next year in January after I collected all 2021 data.


  1. Pingback: CPAN vs Hackage releases, Part 2 | perlancar's blog

