Having your own queryable CPAN mirror using lcpan

minicpan

Many of you already know that you can easily download a mini version of CPAN (meaning only the latest versions of modules: the ones currently being indexed by PAUSE in 02packages.details.txt.gz, without the older versions residing in each author’s directory) for offline use using minicpan (installable via “cpanm -n CPAN::Mini”). It’s currently about 4.5GB so you can put it on your laptop. This offline CPAN mirror is useful when you develop during periods without (reliable) internet connection, e.g. in remote vacation places or during flight. To install CPAN modules from this offline mirror, you can simply do:

% cpanm --mirror /path/to/your/cpan --mirror-only -n Foo::Bar

You can setup a shell alias for that, to make it more convenient. To update all installed modules on your system using this mirror, you can do:

% cpan-outdated --mirror file:/path/to/your/cpan | cpanm --mirror /path/to/your/cpan --mirror-only -n

lcpan

What I’m introducing in this blog post is another tool called lcpan (short for “local cpan”) which uses minicpan to download the mirror, plus extracting META.yml/META.json from each release files and indexing the information from those meta files to a local SQLite database. The result is, in addition to being able to install modules locally, you can also query various things about modules/distributions/releases. To install lcpan, simply do:

% cpanm -n App::lcpan

After that, download and index your CPAN mirror using:

% lcpan update

By default CPAN mirror will be created at ~/cpan. If you want a different location, you can do so by creating a ~/lcpan.conf containing:

cpan=/path/to/your/cpan

Do “lcpan update” regularly (e.g. once a day) if you want to keep it up to date.

Installing modules

lcpan comes with a thin cpanm wrapper called lcpanm that makes it more convenient for you to install modules from your local CPAN mirror. Instead of:

% cpanm --mirror ~/cpan --mirror-only -n Foo::Bar

now you can simply type:

% lcpanm -n Foo::Bar

as lcpanm will read the lcpan configuration to find out where the local CPAN mirror is.

Querying your local CPAN mirror

Now for the fun part. As mentioned above, lcpan also creates a SQLite database (by default it’s in ~/cpan/index.db) that you can query, either using the lcpan tool itself (which already provides quite a few subcommands for querying various things) or, if needed, query directly yourself using DBI. First, the basics. To list authors:

% lcpan authors ;# just the CPAN IDs
% lcpan authors --detail ;# along with version, dist, etc
% lcpan authors BING ;# search
% lcpan authors --detail @yahoo ;# search

To list modules:

% lcpan modules ;# just the names
% lcpan modules --detail ;# along with version, dist, etc
% lcpan modules Foo ;# search
% lcpan modules --author PERLANCAR --dist App-lcpan ;# add some filters

To list distributions:

% lcpan dists ;# just the names
% lcpan dists --detail ;# along with version, dist, etc
% lcpan dists Foo ;# search
% lcpan dists --author PERLANCAR --latest ;# add some filters
% lcpan dists --author PERLANCAR --detail --nolatest ;# list old versions of distribution

To list releases:

% lcpan releases ;# just the names
% lcpan releases --detail ;# along with version, dist, etc
% lcpan releases Foo ;# search
% lcpan releases --author PERLANCAR --has-buildpl --has-metajson ;# add some filters

Dependencies information

One of the most important information I want to query (and the reason I created lcpan in the first place, and the aspect of lcpan which is already being used by other distributions) is dependencies. Instead of having to browse metacpan.org or call its API, if you have a fairly recent index, you can instead just query your local CPAN mirror index for dependencies information. To list what modules are required by a module (to be exact, what modules are required by the distribution that a module is in):

% lcpan deps Text::ANSITable
+------------------------------+-----------+----------+
| module                       | author    | version  |
+------------------------------+-----------+----------+
| Border::Style::Role          | PERLANCAR | 0        |
| Color::RGB::Util             | PERLANCAR | 0        |
| Color::Theme::Role::ANSI     | PERLANCAR | 0        |
| Data::Unixish::ANSI          | SHARYANTO | 0.02     |
| Data::Unixish::Apply         | PERLANCAR | 1.33     |
| DateTime                     | DROLSKY   | 0        |
| Function::Fallback::CoreOrPP | PERLANCAR | 0        |
| JSON                         | MAKAMAKA  | 0        |
| Log::Any                     | DAGOLDEN  | 0        |
| Module::List                 | ZEFRAM    | 0        |
| Moo                          | HAARG     | 0        |
| Package::MoreUtil            | PERLANCAR | 0        |
| Parse::VarName               | SHARYANTO | 0        |
| Term::App::Role::Attrs       | PERLANCAR | 0        |
| Text::ANSI::Util             | PERLANCAR | 0.08     |
| experimental                 | LEONT     | 0        |
| namespace::clean             | RIBASUSHI | 0        |
| perl                         |           | 5.010001 |
+------------------------------+-----------+----------+

To view recursive dependencies, add -R:

% lcpan deps -R Text::ANSITable
+--------------------------------+-----------+----------+
| module                         | author    | version  |
+--------------------------------+-----------+----------+
| Border::Style::Role            | PERLANCAR | 0        |
| Color::RGB::Util               | PERLANCAR | 0        |
| Color::Theme::Role::ANSI       | PERLANCAR | 0        |
|   Color::ANSI::Util            | PERLANCAR | 0        |
| Data::Unixish::ANSI            | SHARYANTO | 0.02     |
|   Data::Unixish::Util          | PERLANCAR | 1.43     |
| Data::Unixish::Apply           | PERLANCAR | 1.33     |
|   Number::Format               | WRW       | 0        |
|   Number::Format::Metric       | PERLANCAR | 0        |
|   Rinci                        | PERLANCAR | v1.1.67  |
|     DefHash                    | PERLANCAR | v1.0.6   |
|   String::Pad                  | PERLANCAR | 0        |
|   Syntax::Feature::EachOnArray | SHARYANTO | 0        |
|     Hash::FieldHash            | GFUJI     | 0        |
|     syntax                     | PHAYLON   | 0        |
|       Data::OptList            | RJBS      | 0.104    |
|         Params::Util           | ADAMK     | 0        |
|         Sub::Install           | RJBS      | 0.921    |
|   Text::sprintfn               | PERLANCAR | 0        |
|   Tie::Simple                  | HANENKAMP | 0        |
|   Unixish                      | SHARYANTO | v1.0.1   |
| DateTime                       | DROLSKY   | 0        |
|   DateTime::Locale             | DROLSKY   | 0.41     |
|   DateTime::TimeZone           | DROLSKY   | 1.74     |
|     Class::Singleton           | SHAY      | 1.03     |
|     List::AllUtils             | DROLSKY   | 0        |
|       List::MoreUtils          | REHSACK   | 0.28     |
|         Exporter::Tiny         | TOBYINK   | 0.038    |
|       List::Util               | PEVANS    | 1.31     |
|   Params::Validate             | DROLSKY   | 0.76     |
| Function::Fallback::CoreOrPP   | PERLANCAR | 0        |
|   Clone::PP                    | NEILB     | 0        |
| JSON                           | MAKAMAKA  | 0        |
| Log::Any                       | DAGOLDEN  | 0        |
| Module::List                   | ZEFRAM    | 0        |
| Moo                            | HAARG     | 0        |
| Package::MoreUtil              | PERLANCAR | 0        |
| Parse::VarName                 | SHARYANTO | 0        |
|   Exporter::Lite               | NEILB     | 0        |
| Term::App::Role::Attrs         | PERLANCAR | 0        |
|   Moo::Role                    | HAARG     | 0        |
|     Class::Method::Modifiers   | ETHER     | 1.1      |
|     Devel::GlobalDestruction   | HAARG     | 0.11     |
|     Role::Tiny                 | HAARG     | 2        |
|   Term::Detect::Software       | PERLANCAR | 0        |
|     File::Which                | PLICEASE  | 0        |
| Text::ANSI::Util               | PERLANCAR | 0.08     |
|   Text::WideChar::Util         | PERLANCAR | 0.10     |
|     Unicode::GCString          | NEZUMI    | 0        |
|       MIME::Charset            | NEZUMI    | v1.6.2   |
| experimental                   | LEONT     | 0        |
| namespace::clean               | RIBASUSHI | 0        |
|   B::Hooks::EndOfScope         | ETHER     | 0.12     |
|     Sub::Exporter::Progressive | FREW      | 0.001006 |
|   Package::Stash               | DOY       | 0.23     |
|     Dist::CheckConflicts       | DOY       | 0.02     |
|     Module::Implementation     | DROLSKY   | 0.06     |
|       Module::Runtime          | ZEFRAM    | 0.012    |
|       Try::Tiny                | DOY       | 0        |
| perl                           |           | 5.010001 |
+--------------------------------+-----------+----------+

There are several options provided by the deps subcommand, e.g. only listing dependencies for a certain relationship (e.g. recommends) or phase (e.g. configure instead of runtime), filtering by author, and so on. Reverse dependencies information is also available, because that’s just the other side of the same coin:

% lcpan rdeps Text::ANSITable
+-----------+-------------------------------------------+---------+
| author    | dist                                      | version |
+-----------+-------------------------------------------+---------+
| SHARYANTO | Data-Format-Pretty-Console                | 0.33    |
| PERLANCAR | Perinci-CmdLine-Classic                   | 1.49    |
| PERLANCAR | Perinci-CmdLine-Classic                   | 1.50    |
| PERLANCAR | Pod-Weaver-Section-BorderStyles-ANSITable | 0.03    |
| PERLANCAR | Text-ANSITable-ColorTheme-Extra           | 0.14    |
+-----------+-------------------------------------------+---------+

(Hm, rather embarassing isn’t it. Nobody but me is using it). The rdeps subcommand also has several options, which you can see using lcpan rdeps --help or by consulting the manpage.

Other stuffs

Lots of other stuffs are also provided, from the documentation:

% lcpan mod2dist Text::ANSITable::ColorTheme::Default ;# -> Text-ANSITable

% lcpan mod2rel  Text::ANSITable::ColorTheme::Default ;# -> Text-ANSITable-0.39.tar.gz
% lcpan mod2rel  Text::ANSITable --full-path          ;# -> /cpan/authors/id/P/PE/PERLANCAR/Text-ANSITable-0.39.tar.gz

% lcpan dist2rel Text-ANSITable             ;# -> Text-ANSITable-0.39.tar.gz
% lcpan dist2rel Text-ANSITable --full-path ;# -> /cpan/authors/id/P/PE/PERLANCAR/Text-ANSITable-0.39.tar.gz

% lcpan distmods Text-ANSITable ;# list modules in a distribution
Text::ANSITable
Text::ANSITable::BorderStyle::Default
Text::ANSITable::ColorTheme::Default
Text::ANSITable::StyleSet::AltRow

% lcpan authormods PERLANCAR   ;# list an author's modules
% lcpan authordists PERLANCAR  ;# list an author's dists
% lcpan authorrels PERLANCAR   ;# list an author's releases

# who are authors with the most number of releases?
% lcpan authors-by-rel-count

# who are authors with the most number of distributions?
% lcpan authors-by-dist-count

# who are authors with the most number of registered modules/packages?
% lcpan authors-by-mod-count

# show all other authors' distributions using one of your modules
% lcpan authorrdeps PERLANCAR --user-author-isnt PERLANCAR

# show your old releases (which you should probably delete from CPAN?)
% lcpan releases --author PERLANCAR --nolatest

# what are modules that are used the most by other distributions?
% lcpan mods-by-rdep-count

Other/prior work

CPAN::SQLite is a module which parses the three CPAN indexes 01mailrc.txt.gz, 02packages.details.txt.gz, and 03modlist.data.gz (which is now empty) into SQLite database. However, it does not index any dependency information, which I need.

CPANDB (and its companion generator CPANDB::Generator also indexes information into a SQLite database, but aside from CPAN it also downloads and indexes additional sources like PAUSE upload data and CPAN ratings. The downloads are quite huge (multigigabyte) and not incremental, making it less convenient to update daily.

Pinto has a different goal of creating and managing CPAN-like repository, but can surely be used to mirror CPAN and show you the dependencies information. However, the Pinto documentation warns about Pinto “not indexing exactly like PAUSE does”, so there might be minor/subtle differences.

Closing remarks

I’ll be adding more queries and subcommands as I see fit. If you have ideas, please send it my way. Or, if you want to add some stuffs, it’s welcome too. The code is on github, and adding a new subcommand should be easy and obvious.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s