lcpan tips 012: Querying the SQLite database directly

About this series: a collection of short, daily blog posts about lcpan tips/recipes. Some posts will also end up in the upcoming App::lcpan::Manual::Cookbook POD to be included in the App-lcpan distribution.

About lcpan: an application to download and index a mini CPAN mirror on your local filesystem, so in effect you will have something like your own CPAN with a command-line tool (or perl API) to query and extract information from your mirror. I find it perfect for my own personal use when working offline.

As you might have notices, lcpan indexes the CPAN mirror into a SQLite database. By default the database is the index.db at the top-level directory of the mirror (so, by default it’s ~/cpan/index.db.

You can see the schema for the database in the App::lcpan source code. Copy-pasted here:

    install => [
        'CREATE TABLE author (
             cpanid VARCHAR(20) NOT NULL PRIMARY KEY,
             fullname VARCHAR(255) NOT NULL,
             email TEXT
         )',
 
        'CREATE TABLE file (
             id INTEGER NOT NULL PRIMARY KEY,
             name TEXT NOT NULL,
             cpanid VARCHAR(20) NOT NULL REFERENCES author(cpanid),
 
             mtime INT,
             size INT,
 
             -- file status: ok (archive type is known, content can be listed,
             -- and at least some files can be extracted), nofile (file does not
             -- exist in mirror), unsupported (archive type is not supported,
             -- e.g. rar, pm.gz), err (cannot be opened/extracted for some
             -- reason)
 
             file_status TEXT,
             file_error TEXT,
 
             -- META.* processing status: ok (meta has been extracted and
             -- parsed), err (META.json/META.yml has some error), nometa (no
             -- META.json/META.yml found).
 
             meta_status TEXT,
             meta_error TEXT,
 
             -- POD processing status: ok (POD has been extracted and
             -- parsed/indexed).
 
             pod_status TEXT,
 
             -- sub processing status: ok (sub names have been parsed/indexed)
 
             sub_status TEXT,
 
             has_metajson INTEGER,
             has_metayml INTEGER,
             has_makefilepl INTEGER,
             has_buildpl INTEGER
        )',
        'CREATE UNIQUE INDEX ix_file__cpanid__name ON file(cpanid,name)',
 
        # files inside the release archive file
        'CREATE TABLE content (
             id INTEGER NOT NULL PRIMARY KEY,
             file_id INTEGER NOT NULL REFERENCES file(id),
             path TEXT NOT NULL,
             package TEXT, -- only the first package declaration will be recorded
             mtime INT,
             size INT -- uncompressed size
        )',
        'CREATE UNIQUE INDEX ix_content__file_id__path ON content(file_id, path)',
        'CREATE INDEX ix_content__package ON content(package)',
 
        'CREATE TABLE module (
             id INTEGER NOT NULL PRIMARY KEY,
             name VARCHAR(255) NOT NULL,
             cpanid VARCHAR(20) NOT NULL REFERENCES author(cpanid), -- [cache]
             file_id INTEGER NOT NULL,
             version VARCHAR(20),
             version_numified DECIMAL,
             content_id INTEGER REFERENCES content(id),
             abstract TEXT
         )',
        'CREATE UNIQUE INDEX ix_module__name ON module(name)',
        'CREATE INDEX ix_module__file_id ON module(file_id)',
        'CREATE INDEX ix_module__cpanid ON module(cpanid)',
 
        'CREATE TABLE script (
             id INTEGER NOT NULL PRIMARY KEY,
             file_id INTEGER NOT NULL REFERENCES file(id), -- [cache]
             cpanid VARCHAR(20) NOT NULL REFERENCES author(cpanid), -- [cache]
             name TEXT NOT NULL,
             content_id INT REFERENCES content(id),
             abstract TEXT
        )',
        'CREATE UNIQUE INDEX ix_script__file_id__name ON script(file_id, name)',
        'CREATE INDEX ix_script__name ON script(name)',
 
        'CREATE TABLE mention (
             id INTEGER NOT NULL PRIMARY KEY,
             source_file_id INT NOT NULL REFERENCES file(id), -- [cache]
             source_content_id INT NOT NULL REFERENCES content(id),
             module_id INTEGER, -- if mention module and module is known (listed in module table), only its id will be recorded here
             module_name TEXT,  -- if mention module and module is unknown (unlisted in module table), only the name will be recorded here
             script_name TEXT   -- if mention script
        )',
        'CREATE UNIQUE INDEX ix_mention__module_id__source_content_id   ON mention(module_id, source_content_id)',
        'CREATE UNIQUE INDEX ix_mention__module_name__source_content_id ON mention(module_name, source_content_id)',
        'CREATE UNIQUE INDEX ix_mention__script_name__source_content_id ON mention(script_name, source_content_id)',
 
        'CREATE TABLE namespace (
            name VARCHAR(255) NOT NULL,
            num_sep INT NOT NULL,
            has_child BOOL NOT NULL,
            num_modules INT NOT NULL
        )',
        'CREATE UNIQUE INDEX ix_namespace__name ON namespace(name)',
 
        'CREATE TABLE dist (
             id INTEGER NOT NULL PRIMARY KEY,
             name VARCHAR(90) NOT NULL,
             cpanid VARCHAR(20) NOT NULL REFERENCES author(cpanid), -- [cache]
             abstract TEXT,
             file_id INTEGER NOT NULL,
             version VARCHAR(20),
             version_numified DECIMAL,
             is_latest BOOLEAN -- [cache]
         )',
        'CREATE INDEX ix_dist__name ON dist(name)',
        'CREATE UNIQUE INDEX ix_dist__file_id ON dist(file_id)',
        'CREATE INDEX ix_dist__cpanid ON dist(cpanid)',
 
        'CREATE TABLE dep (
             file_id INTEGER,
             dist_id INTEGER, -- [cache]
             module_id INTEGER, -- if module is known (listed in module table), only its id will be recorded here
             module_name TEXT,  -- if module is unknown (unlisted in module table), only the name will be recorded here
             rel TEXT, -- relationship: requires, ...
             phase TEXT, -- runtime, ...
             version VARCHAR(20),
             version_numified DECIMAL,
             FOREIGN KEY (file_id) REFERENCES file(id),
             FOREIGN KEY (dist_id) REFERENCES dist(id),
             FOREIGN KEY (module_id) REFERENCES module(id)
         )',
        'CREATE INDEX ix_dep__module_name ON dep(module_name)',
        # 'CREATE UNIQUE INDEX ix_dep__file_id__module_id ON dep(file_id,module_id)', # not all module have module_id anyway, and ones with module_id should already be correct because dep is a hash with module name as key
 
        'CREATE TABLE sub (
             id INTEGER NOT NULL PRIMARY KEY,
             file_id INTEGER NOT NULL REFERENCES file(id), --[cache]
             content_id INTEGER NOT NULL REFERENCES content(id),
             name TEXT NOT NULL,
             linum INTEGER NOT NULL
         )',
        'CREATE UNIQUE INDEX ix_sub__name__content_id ON sub(name, content_id)',
 
    ], # install
...

You can also peek at what SQL queries each subcommand is doing by installing Log::Any::DBI::Query and then, for example:

% PERL5OPT=-MLog::Any::DBI::Query TRACE=1 lcpan stats
lcpan: [pericmd] Running hook_before_action ...
lcpan: [pericmd] Running action_call() ...
lcpan: [pericmd] Riap request: action=call, url=/App/lcpan/Cmd/stats/handle_cmd
lcpan: Connecting to SQLite database at /home/s1/cpan/index.db ...
lcpan: SQL query (prepare): {{PRAGMA database_list}}
lcpan: SQL query (execute): {{[]}}
lcpan: SQL query (prepare): {{SELECT *
FROM
(
SELECT NULL         TABLE_CAT
     ,              TABLE_SCHEM
     , tbl_name     TABLE_NAME
     ,              TABLE_TYPE
     , NULL         REMARKS
     , sql          sqlite_sql
FROM (
    SELECT 'main' TABLE_SCHEM, tbl_name, upper(type) TABLE_TYPE, sql
    FROM sqlite_master
UNION ALL
    SELECT 'temp' TABLE_SCHEM, tbl_name, 'LOCAL TEMPORARY' TABLE_TYPE, sql
    FROM sqlite_temp_master
UNION ALL
    SELECT 'main' TABLE_SCHEM, 'sqlite_master'      tbl_name, 'SYSTEM TABLE' TABLE_TYPE, NULL sql
UNION ALL
    SELECT 'temp' TABLE_SCHEM, 'sqlite_temp_master' tbl_name, 'SYSTEM TABLE' TABLE_TYPE, NULL sql
)
)
 WHERE TABLE_NAME LIKE 'meta'
 ORDER BY TABLE_TYPE, TABLE_SCHEM, TABLE_NAME
}}
lcpan: SQL query (execute): {{[]}}
...

In the future I might make query-peeking easier, e.g. adding a --db-trace option or something.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s