Disk space usage of CPAN authors (May 16, 2021 edition)

Prompted by this post on PAUSE running out of disk space, I thought I'd refresh my own post "Top 50 authors by disk space on BackPAN" (Dec 26, 2019) and add a stat for CPAN too. Hopefully this will be a regular post, ish.

On CPAN

Source: https://www.cpan.org/indices/find-ls.gz (~14MB as of this writing), but more readily used: https://www.cpan.org/indices/du-k.gz (~80KB as of this writing).

Total size of CPAN:

% zcat du-k.gz | perl -lne'@F=split /\s+/, $_, 2; $F[1] eq "." and printf "%.1fGB\n", $F[0]/1024/1024 and last'
29.0GB

So, CPAN is not that big by today's standards and I would guess that it's growing at a relatively regular pace. Also the first post does not give details on the spec of PAUSE's server nor the exact thing that caused the disk to be full, but anyway.

Total size of authors/id/ directory:

% zcat du-k.gz | perl -lne'@F=split /\s+/, $_, 2; $F[1] eq "./authors/id" and printf "%.1fGB\n", $F[0]/1024/1024 and last'
28.0GB

Not much surprise here. The authors/id/ directory is where CPAN authors upload to. Outside of this directory, it's mostly just indices (modules/, indices/) or stuffs like old scripts (scripts/) and ancient Perl sources/binaries (src/, ports/).

Top 50 authors by disk space usage:

% zcat du-k.gz | perl -lne'@F=split /\s+/, $_, 2; $F[1] eq "./authors/id" and $total = $F[0]; $F[1] =~ m!authors/id/./../(\w+)! or next; $sizes{$1}=$F[0]; END { printf "%9s %6.1fMB (%4.1f%%)\n", $_, $sizes{$_}/1024, $sizes{$_}/$total*100 for sort { $sizes{$b}<=>$sizes{$a} } keys %sizes }' | head -n50
     SHAY 2397.8MB ( 8.4%)
   LSKATZ 1144.4MB ( 4.0%)
     RJBS  938.6MB ( 3.3%)
 XSAWYERX  778.9MB ( 2.7%)
TIEDEMANN  612.7MB ( 2.1%)
      LDS  535.6MB ( 1.9%)
    GIBUS  460.2MB ( 1.6%)
  TMILLER  362.7MB ( 1.3%)
   BINGOS  322.3MB ( 1.1%)
  ABIGAIL  321.8MB ( 1.1%)
   OLIVER  311.3MB ( 1.1%)
PERLANCAR  296.2MB ( 1.0%)
 WOLFSAGE  268.9MB ( 0.9%)
    JESSE  259.6MB ( 0.9%)
    FLORA  239.3MB ( 0.8%)
  VANSTYN  208.2MB ( 0.7%)
SUNDQUIST  192.7MB ( 0.7%)
   STEVEB  191.4MB ( 0.7%)
   DGINEV  189.7MB ( 0.7%)
 CJFIELDS  183.1MB ( 0.6%)
 PAWAPAWA  179.3MB ( 0.6%)
  NWCLARK  177.0MB ( 0.6%)
      SRI  176.7MB ( 0.6%)
      ARC  169.4MB ( 0.6%)
    ETHER  158.5MB ( 0.6%)
 JDDPAUSE  146.0MB ( 0.5%)
 DAGOLDEN  137.3MB ( 0.5%)
      KAL  133.3MB ( 0.5%)
   RENEEB  132.1MB ( 0.5%)
   ABRETT  131.8MB ( 0.5%)
     TVDW  129.2MB ( 0.5%)
   ZEFRAM  127.1MB ( 0.4%)
 GRIBUSER  124.4MB ( 0.4%)
    MOTIF  122.9MB ( 0.4%)
   STEVAN  116.0MB ( 0.4%)
 MIYAGAWA  109.7MB ( 0.4%)
  DROLSKY  109.1MB ( 0.4%)
     DAPM  109.0MB ( 0.4%)
  ATOOMIC  105.9MB ( 0.4%)
   CORION   94.1MB ( 0.3%)
 LBROCARD   93.3MB ( 0.3%)
      MGV   92.4MB ( 0.3%)
    ADAMK   88.5MB ( 0.3%)
  RGARCIA   88.1MB ( 0.3%)
     LETO   86.6MB ( 0.3%)
 BRMILLER   79.6MB ( 0.3%)
  ASLEWIS   78.7MB ( 0.3%)
      JWB   77.1MB ( 0.3%)
 GENEHACK   77.0MB ( 0.3%)
 AUTRIJUS   76.8MB ( 0.3%)

On BackPAN

Source: http://backpan.cpantesters.org/backpan-full-index.txt.gz (~14MB as of this writing)

Total size of files on BackPAN:

% zcat backpan-full-index.txt.gz | perl -lnE'/.+ \d+ (\d+)$/ or next; $size+=$1; END { printf "%.1fGB\n", $size/1024/1024/1024 }'
78.5GB

This is a 9.50% increase from Dec 26, 2019.

Top 50 authors by disk space usage on BackPAN:

% zcat backpan-full-index.txt.gz | perl -lnE'm!authors/id/./../(\w+)/.+ (\d+) (\d+)$! or next; $size{$1}+=$3; END { printf "%9s %6.1fMB (%4.1f%%)\n", $_, $size{$_}/1024/1024, $size{$_}/1024/1024/73445.3*100 for sort { $size{$b}<=>$size{$a} } keys %size }' | head -n50
 REEDFISH 9607.0MB (13.1%)
     SHAY 2423.3MB ( 3.3%)
   LSKATZ 1771.1MB ( 2.4%)
     RJBS 1761.0MB ( 2.4%)
      ZDM 1724.6MB ( 2.3%)
PERLANCAR 1322.5MB ( 1.8%)
   AJPAGE 1218.4MB ( 1.7%)
 XSAWYERX 1185.4MB ( 1.6%)
TIEDEMANN 1090.9MB ( 1.5%)
      KAL  957.1MB ( 1.3%)
DCANTRELL  873.4MB ( 1.2%)
      LDS  767.1MB ( 1.0%)
   BINGOS  747.7MB ( 1.0%)
    JKEGL  692.0MB ( 0.9%)
      INA  639.6MB ( 0.9%)
    JESSE  625.7MB ( 0.9%)
BTMCINNES  615.8MB ( 0.8%)
   DGINEV  575.2MB ( 0.8%)
  DROLSKY  546.8MB ( 0.7%)
      SRI  539.1MB ( 0.7%)
 JDDPAUSE  520.5MB ( 0.7%)
      CHM  495.5MB ( 0.7%)
 PAWAPAWA  474.2MB ( 0.6%)
 AREIBENS  470.2MB ( 0.6%)
    GIBUS  459.7MB ( 0.6%)
  RKELSCH  452.3MB ( 0.6%)
  NWCLARK  435.6MB ( 0.6%)
  TMILLER  433.5MB ( 0.6%)
   OLIVER  428.6MB ( 0.6%)
 CJFIELDS  427.6MB ( 0.6%)
   STEVEB  426.6MB ( 0.6%)
     AMBS  397.5MB ( 0.5%)
    ADAMK  374.4MB ( 0.5%)
EARONESTY  370.6MB ( 0.5%)
 MLEHMANN  355.6MB ( 0.5%)
     JGNI  353.6MB ( 0.5%)
 DANKOGAI  349.0MB ( 0.5%)
    ETHER  339.8MB ( 0.5%)
   NHORNE  334.9MB ( 0.5%)
  ABIGAIL  327.2MB ( 0.4%)
  ASLEWIS  315.2MB ( 0.4%)
  GRAHAMC  295.0MB ( 0.4%)
  MARTIMM  293.3MB ( 0.4%)
 MIYAGAWA  290.8MB ( 0.4%)
    HISSO  276.6MB ( 0.4%)
   CORION  271.8MB ( 0.4%)
  VANSTYN  271.1MB ( 0.4%)
 WOLFSAGE  269.2MB ( 0.4%)
      ETJ  268.6MB ( 0.4%)
 DBAURAIN  267.5MB ( 0.4%)

Some authors have mentioned that they would like to have their old releases purged from BackPAN as well, but from what I see so far this does not seem to have happened yet.

Also if you see, some authors have not done much purging. I remember someone produced percentage number of an individual author's CPAN/BackPAN usage as a measure of "cleanup", but can't remember who and where.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s