Removing password from PDF files

Electronic billing statement is great. Paperless, easier to archive, and you don't need to care if you leave the house for a month or move addresses. But what's the most irritating thing about credit card e-statements? That the banks decide to encrypt the PDF with some easy-to-guess password like YYYYMMDD of your birth date, or FirstNameDDMMYY, or some other variation. But each bank picks a different password, naturally.

So the first thing I do after downloading an e-statement is to run them through my remove-pdf-password script. It's basically a wrapper for "qpdf" and will try a series of passwords, then decrypt the PDF in-place. You put the passwords in ~/.config/remove-pdf-password.conf, e.g.:

passwords = PASS1  ; anz
passwords = PASS2  ; hsbc
passwords = PASS2  ; uob
...

The script is distributed with the App::PDFUtils distribution. There's the counterpart script in the distribution, add-pdf-password, in case you want to return the favor. The next time some bank staff requests your documents, you can send them in an email encrypted. In the email you write, "the password is the name of the bank followed by the phone number of the bank, but replace every f with f*ckyou …"

Advertisements

Quickly generate CSV or TSV from SQL query

Sometimes to produce CSV or TSV from information in a MySQL database, all you have to do is:

echo 'some SELECT query ...' | mysql mydb >data.tsv

but for more complex things, you'll often need to use scripting like with Perl to do some munging first. When you're ready to output the data with:

$sth->fetchrow_arrayref
$sth->fetchrow_hashref

the data is in a Perl data structure. How to quickly output CSV? I've looked at DBI::Format and DBIx::CSVDumper and they are cumbersome especially if you want something brief for one-liners: you need to instantiate a separate object with some parameters, then call an additional one or more methods. You might as well just do this with the standard Text::CSV (or Text::CSV_XS):

use Text::CSV;
my $csv = Text::CSV->new;

...
$csv->print(\*STDOUT, $sth->{NAME});
while (my $row = $sth->fetchrow_arrayref) {
     $csv->print(\*STDOUT, $row);
}

For TSV it's even shorter (note: fail spectacularly when data contains Tab character, but that is relatively rare):

say join("\t", @{$sth->{NAME}});
while (my $row = $sth->fetchrow_arrayref) {
     say join("\t", @$row);
}

My modules DBIx::CSV and DBIx::TSV attempt to offer something even briefer (for CSV, at least). Instead of fetchrow_arrayref or fetchall_arrayref (or selectrow_arrayref, selectall_arrayref), you use fetchrow_csv and friends:

print $sth->fetchall_csv;

This automatically prints the header too, and if you don't want header:

print $sth->fetchall_csv_noheader;

For even more text table formats, like ASCII table, Org table, Markdown table, and so on, I've also written DBIx::TextTableAny.

I've also just written DBIx::Conn::MySQL after being tired from writing the gazillionth one-liner to connect to MySQL database. Now, instead of:

% perl -MDBI -E'my $dbh = DBI->connect("dbi:mysql:database=mydb", "username", "password"); $sth = $dbh->prepare(...)'

you can now write this:

% perl -MDBI::Conn::MySQL=mydb -E'$sth = $dbh->prepare(...)'

Generating random passwords according to patterns with genpw

There are several modules on CPAN for generating random passwords, but last time I checked, none of them are flexible enough. Different websites or applications have different requirements (sometimes ridiculous ones) for passwords. True, most modules allow setting minimum/maximum length, and some of them allow customizing the number of special characters, or number of digits, but what I want is some sort of template/pattern that the generator would follow. So I created genpw.

Basic functionality

genpw is your usual random password generator CLI. When run without any arguments, it returns a single random password (8-20 characters long, comprising letters/digits):

% genpw
J9K3ZjBVR

To return several passwords:

% genpw 5
wAYftKsS
knaY7MOBbcvFFS3L1wyW
oQGz62aF
sG1A9reVOe
Zo8GoFEq

To set exact length or minimum/maximum of length:

% genpw -l 4
1CN4
% genpw --min-len 12 --max-len 14
nqBKX5lyyx7B

Patterns

In addition to the above basic customization, genpw allows you to specify a pattern (or several patterns to pick randomly, for that matter). A pattern is a string that is similar to a printf pattern where conversion sequences like %d will be replaced with actual random characters. Here are the available conversions:

%l   Random Latin letter (A-Z, a-z)
%d   Random digit (0-9)
%h   Random hexdigit (0-9a-f)
%a   Random letter/digit (Alphanum) (A-Z, a-z, 0-9; combination of %l and %d)
%s   Random ASCII symbol, e.g. "-" (dash), "_" (underscore), etc.
%x   Random letter/digit/ASCII symbol (combination of %a and %s)
%m   Base64 character (A-Z, a-z, 0-9, +, /)
%b   Base58 character (A-Z, a-z, 0-9 minus IOl0)
%B   Base56 character (A-Z, a-z, 0-9 minus IOol01)
%%   A literal percent sign
%w   Random word

You can specify %NC (where N is a positive integer, and C is the conversion) to mean a sequence of random characters (or a random word) with the exact length of N. Or %N$MC (where N and M are positive integers) to mean random characters (or word) with length between N and M.

Unsupported conversion will be unchanged, like in printf.

Some examples:

Generate random digits between 10 and 12 characters long:

% genpw -p '%10$12d'
55597085674

Generate a random UUID:

% genpw -p '%8h-%4h-%4h-%4h-%12h'
ff26d142-37a8-ecdf-c7f6-8b6ae7b27695

Like the above, but in uppercase:

% genpw -p '%8h-%4h-%4h-%4h-%12h' -U
22E13D9E-1187-CD95-1D05-2B92A09E740D

Words

The %w conversion in pattern mean to replace with a random word. Words are fetched from STDIN (and will be shuffled before use). For example, the command below will generate password in the form of a random word + 4 random digits:

% genpw -p '%w%4d' < /usr/share/dict/words
shafted0412

Instead of from STDIN, you can also fetch words from the various WordList::* modules available on CPAN (and installed locally). A separate CLI provides this functionality: genpw-wordlist, which is basically just a wrapper to extract the words from the wordlist module(s) then feed it to App::genpw. By default, if you don't specify -w option(s) to select wordlist modules, the CLI will use WordList::EN::Enable:

% genpw-wordlist -p '%w%4d'
sedimentologists8542

Generate 5 passwords comprising 8-character word followed by between four to six random digits:

% genpw-wordlist 5 -p '%8w-%4$6d'
alcazars-218050
pathogen-4818
hogmanes-212244
pollened-3206
latinity-21053

Configuration file

To avoid you from having to type patterns again and again, you can use a configuration file. For example, put this in $HOME/genpw.conf:

[profile=uuid]
patterns = %8h-%4h-%4h-%4h-%12h
case = upper

then you can just say:

% genpw -P uuid
7B8FB8C6-0D40-47A3-78B7-848A114E5C6D

Speed

The speed is not great, around 2000 passwords/sec on my laptop, though I believe this should not matter for most use-cases.

Closing

genpw is a flexible random password generator where you can specify patterns or templates for the password. I'm still working on a few things, like how to enable secure random source in the most painless way. There are also various CLI variants to generate specific kinds of passwords as straightforward as possible: genpw-base56, genpw-base64, genpw-id, and the aforementioned genpw-wordlist.

Introducing Log::ger

Yesterday more or less completed the migration of my CPAN modules from using Log::Any to Log::ger. Sorry for the noise in the releases news channels due to the high number of my CPAN modules (particularly to Chase Whitener).

I did not have anything against Log::Any, to be honest. It was lightweight and so easy to use in modules as it encourages separation of log producers and consumers. Sure, I wish Log::Any had some features that I want but I was okay with using it for all my logging needs.

Until Log::Any 1.00 came along in late 2014, when its startup overhead jumped from ~2ms (version 0.15) to ~15ms. The new version ballooned from just loading strict, warnings, and Log::Any::Adapter::Null to over a dozen modules. Later versions of Log::Any improve somewhat on the startup overhead front but after the introduction of Log::Any::Proxy I thought it probably will not get back to the previous lightweight level. So I planned to write a more lightweight alternative along with probably implementing my wishlist. But this would require some time and in the mean time I wrote a hackish workaround called Log::Any::IfLOG that will load Log::Any only if environment variable like LOG or DEBUG or TRACE is set to true. I hated that workaround and regretted having created it.

By the way, why do I fuss over a few milliseconds? My major interest in using Perl is for building CLI applications. I am simply annoyed if my CLIs show a noticeable (~50-100ms or more) delay before responding with output (major offenders include Moose-based CLIs like dzil) when the fact is that Perl can be much more responsive than that. Log::Any is but one of several (sometimes many) modules I must load in a CLI so if a few ms is added here and a few more there, it could quickly add up. Also my CLIs feature shell tab completion and this is implemented by running the CLIs themselves for getting the completion answers so I always prefer responsive CLIs.

The recent Eid al-Fitr holiday finally made it possible for me to write a replacement for Log::Any: Log::ger in the course of a couple of weeks, along with all the log outputs and plugins to match all the features that I needed. So in what ways is Log::ger different than Log::Any or the other logger libraries?

First of all, for the low startup overhead goal, I've managed to keep use Log::ger having an overhead of just 0.5-1ms (without loading any extra modules). This is even less than use warnings and certainly less than use Log::Any (8-10ms) or the much heavier use Log::Log4perl ':easy' (35ms). This means, adding logging with Log::ger to your modules now incurs a negligible startup overhead. You can add logging to most of your modules without worrying about startup overhead.

What about null-logging (stealth-logging) overhead? Log::ger also manages to be the fastest here. It's about 1.5x faster than Log::Any, 3x faster than Log4perl, and 5x faster than Log::Fast (included here because the name claims something about speed). This is thanks to using procedural style logging (log_warn("foo")) instead of OO ($log->warn("foo")) and just using an empty subroutine sub {0} as the null default. If you don't want that tiny runtime overhead too, you can eliminate it with Log::ger::Plugin::OptAway. This plugin uses some B magic to turn your logging statements into a constant so they are removed during run-time.

As a bonus, due to the modular and flexible design of Log::ger, you can also: log using OO-style, use Log::Any style (method names and formatting rule), use Log::Log4perl style (method names and formatting rule), use Log::Contextual style (block style), or mimic other interface that you want. And mix different styles in different modules of your application. And as another bonus, writing a Log::ger output is also simpler and significantly shorter than writing a Log::Any adapter. Compare Log::Any::Adapter::Callback with Log::ger::Output::Callback, or Log::Any::Adapter::Syslog with Log::ger::Output::Syslog.

To keep this post short, instead of explaining how Log::ger works or the details of its features here I welcome you to look at the documentation.

csv-grep (and App::CSVUtils)

Today I decided to add csv-grep to App::CSVUtils, as an alternative to NEILB's csvgrep (which Neil announced about a week ago). I find csvgrep too simplistic for my taste or future needs. It's basically equivalent to:

% ( head -n1 FILE.CSV; grep PATTERN FILE.CSV ) | csv2asciitable

I also think csvgrep's -d option does not belong. It's not relevant to grepping as well as too case-specific. What if user wants the oldest file in the directory? The biggest? The find or ls command should be able to do that for you:

% csvgrep PATTERN "`ls *.csv –sort=t | head -n1`"

In csv-grep, you specify Perl code instead of regex pattern. Your Perl code receives the CSV row in $_ as an arrayref (or hashref, if you specify -H). So you can filter based on some particular fields and use the full expressive Power of Perl. csv-grep outputs CSV, but you can convert it to other formats by the abovementioned csv2asciitable, or to JSON with csv2json, or to Perl data structure with csv2dd, or what have you.

Aside from csv-grep, App::CSVUtils also includes a bunch of other CSV utilities which I wrote when I needed to munge CSV files a few months back. Check it out.

pericmd 048: Showing table data in browser as sortable/searchable HTML table

The latest release of Perinci::CmdLine (1.68) supports viewing program’s output in an external program. And also a new output format is introduced: html+datatables. This will show your program’s output in a browser and table data is shown as HTML table using jQuery and DataTables plugin to allow you to filter rows or sort columns. Here’s a video demonstration:

Your browser does not support the video tag, or WordPress filters the VIDEO element.

If the video doesn’t show, here’s the direct file link.

Getopt modules: Epilogue

About this mini-article series. For each of the past 24 23 days, I have reviewed a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

This series was born out of my experimentations with option parsing and tab completion, and more broadly of my interest in doing CLI with Perl. Aside from writing this series, I've also released numerous modules related to option parsing, some of them are purely experimental in nature and some already used in production.

It has been interesting evaluating the various modules: the sometimes unconventional or seemingly odd approach that they take, or the specific features that they offer. Not all of them are worth using, but at least they provide perspectives and some lessons for us to learn.

Of course, not all modules got reviewed. There are simply far more than 24 modules (lcpan tells me that there are 180 packages in the Getopt:: namespace alone, with 94 distributions having the name Getopt-*). I tried to cover at least the must-know ones, core ones, and the popular ones. Other than that, frankly the selection is pretty much random. I picked what's interesting to me or what I can make some points about, whether they are negative or positive points.

I have skipped many modules that are just yet another Getopt::Long wrapper which adds per-option usage or some other features found in Getopt::Long::Descriptive (GLD). Not that they are worse than GLD, for some reason or another they just didn't get adopted widely or at all. A couple examples of these: Getopt::Helpful, Getopt::Fancy.

Modules which use Moose, except MooseX::Getopt, automatically get skipped by me because their applicability is severely limited by the high number of dependencies and high startup overhead (200-500ms or even more on slower computers). These include: Getopt::Flex, Getopt::Alt, Getopt::Chain.

Some others are simply too weird or high in "WTF number", but I won't name names here.

Except for App::Cmd and App::Spec, I haven't really touched CLI frameworks in general. There are no shortages of CLI frameworks on CPAN too, perhaps for another series?

I've avoided reviewing my own modules, which include Getopt::Long::Complete (Getopt::Long wrapper which adds tab completion), Getopt::Long::Subcommand (Getopt::Long wrapper, with support for subcommands), Getopt::Long::More (my most recent Getopt::Long wrapper which adds tab completion and other features), Getopt::Long::Less & Getopt::Long::EvenLess (two leaner versions of Getopt::Long for the specific goal of reducing startup overhead), Getopt::Panjang (a break from Getopt::Long interface compatibility to explore new possibilities), and a CLI framework Perinci::CmdLine (which currently uses Getopt::Long but plans to switch backend in the long run; I've written a whole series of tutorial posts for this module).

In general, I'd say that you should probably try to stick with Getopt::Long first. As far as option parsing is concerned, it's packed with features already, and it has the advantage of being a core module. But as soon as you want: automatic autohelp/automessage generation, subcommand, tab completion then you should begin looking elsewhere.

Unfortunately except for evaluating Perl ports of some option parsing libraries (like Smart::Options, Getopt::ArgParse, Getopt::Kingpin), I haven't got the chance to deeply look into how option parsing is done in other languages. Among the other languages is Perl's own sister Perl 6, which offers built-in command-line option parsing. This endeavor of researching option parsing in other languages could potentially offer more lessons and perspectives.

I hope this series is of use to some people. Merry christmas and happy holidays to everybody.