Introducing Log::ger

Yesterday more or less completed the migration of my CPAN modules from using Log::Any to Log::ger. Sorry for the noise in the releases news channels due to the high number of my CPAN modules (particularly to Chase Whitener).

I did not have anything against Log::Any, to be honest. It was lightweight and so easy to use in modules as it encourages separation of log producers and consumers. Sure, I wish Log::Any had some features that I want but I was okay with using it for all my logging needs.

Until Log::Any 1.00 came along in late 2014, when its startup overhead jumped from ~2ms (version 0.15) to ~15ms. The new version ballooned from just loading strict, warnings, and Log::Any::Adapter::Null to over a dozen modules. Later versions of Log::Any improve somewhat on the startup overhead front but after the introduction of Log::Any::Proxy I thought it probably will not get back to the previous lightweight level. So I planned to write a more lightweight alternative along with probably implementing my wishlist. But this would require some time and in the mean time I wrote a hackish workaround called Log::Any::IfLOG that will load Log::Any only if environment variable like LOG or DEBUG or TRACE is set to true. I hated that workaround and regretted having created it.

By the way, why do I fuss over a few milliseconds? My major interest in using Perl is for building CLI applications. I am simply annoyed if my CLIs show a noticeable (~50-100ms or more) delay before responding with output (major offenders include Moose-based CLIs like dzil) when the fact is that Perl can be much more responsive than that. Log::Any is but one of several (sometimes many) modules I must load in a CLI so if a few ms is added here and a few more there, it could quickly add up. Also my CLIs feature shell tab completion and this is implemented by running the CLIs themselves for getting the completion answers so I always prefer responsive CLIs.

The recent Eid al-Fitr holiday finally made it possible for me to write a replacement for Log::Any: Log::ger in the course of a couple of weeks, along with all the log outputs and plugins to match all the features that I needed. So in what ways is Log::ger different than Log::Any or the other logger libraries?

First of all, for the low startup overhead goal, I've managed to keep use Log::ger having an overhead of just 0.5-1ms (without loading any extra modules). This is even less than use warnings and certainly less than use Log::Any (8-10ms) or the much heavier use Log::Log4perl ':easy' (35ms). This means, adding logging with Log::ger to your modules now incurs a negligible startup overhead. You can add logging to most of your modules without worrying about startup overhead.

What about null-logging (stealth-logging) overhead? Log::ger also manages to be the fastest here. It's about 1.5x faster than Log::Any, 3x faster than Log4perl, and 5x faster than Log::Fast (included here because the name claims something about speed). This is thanks to using procedural style logging (log_warn("foo")) instead of OO ($log->warn("foo")) and just using an empty subroutine sub {0} as the null default. If you don't want that tiny runtime overhead too, you can eliminate it with Log::ger::Plugin::OptAway. This plugin uses some B magic to turn your logging statements into a constant so they are removed during run-time.

As a bonus, due to the modular and flexible design of Log::ger, you can also: log using OO-style, use Log::Any style (method names and formatting rule), use Log::Log4perl style (method names and formatting rule), use Log::Contextual style (block style), or mimic other interface that you want. And mix different styles in different modules of your application. And as another bonus, writing a Log::ger output is also simpler and significantly shorter than writing a Log::Any adapter. Compare Log::Any::Adapter::Callback with Log::ger::Output::Callback, or Log::Any::Adapter::Syslog with Log::ger::Output::Syslog.

To keep this post short, instead of explaining how Log::ger works or the details of its features here I welcome you to look at the documentation.

Advertisements

csv-grep (and App::CSVUtils)

Today I decided to add csv-grep to App::CSVUtils, as an alternative to NEILB's csvgrep (which Neil announced about a week ago). I find csvgrep too simplistic for my taste or future needs. It's basically equivalent to:

% ( head -n1 FILE.CSV; grep PATTERN FILE.CSV ) | csv2asciitable

I also think csvgrep's -d option does not belong. It's not relevant to grepping as well as too case-specific. What if user wants the oldest file in the directory? The biggest? The find or ls command should be able to do that for you:

% csvgrep PATTERN "`ls *.csv –sort=t | head -n1`"

In csv-grep, you specify Perl code instead of regex pattern. Your Perl code receives the CSV row in $_ as an arrayref (or hashref, if you specify -H). So you can filter based on some particular fields and use the full expressive Power of Perl. csv-grep outputs CSV, but you can convert it to other formats by the abovementioned csv2asciitable, or to JSON with csv2json, or to Perl data structure with csv2dd, or what have you.

Aside from csv-grep, App::CSVUtils also includes a bunch of other CSV utilities which I wrote when I needed to munge CSV files a few months back. Check it out.

pericmd 048: Showing table data in browser as sortable/searchable HTML table

The latest release of Perinci::CmdLine (1.68) supports viewing program’s output in an external program. And also a new output format is introduced: html+datatables. This will show your program’s output in a browser and table data is shown as HTML table using jQuery and DataTables plugin to allow you to filter rows or sort columns. Here’s a video demonstration:

Your browser does not support the video tag, or WordPress filters the VIDEO element.

If the video doesn’t show, here’s the direct file link.

Getopt modules: Epilogue

About this mini-article series. For each of the past 24 23 days, I have reviewed a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

This series was born out of my experimentations with option parsing and tab completion, and more broadly of my interest in doing CLI with Perl. Aside from writing this series, I've also released numerous modules related to option parsing, some of them are purely experimental in nature and some already used in production.

It has been interesting evaluating the various modules: the sometimes unconventional or seemingly odd approach that they take, or the specific features that they offer. Not all of them are worth using, but at least they provide perspectives and some lessons for us to learn.

Of course, not all modules got reviewed. There are simply far more than 24 modules (lcpan tells me that there are 180 packages in the Getopt:: namespace alone, with 94 distributions having the name Getopt-*). I tried to cover at least the must-know ones, core ones, and the popular ones. Other than that, frankly the selection is pretty much random. I picked what's interesting to me or what I can make some points about, whether they are negative or positive points.

I have skipped many modules that are just yet another Getopt::Long wrapper which adds per-option usage or some other features found in Getopt::Long::Descriptive (GLD). Not that they are worse than GLD, for some reason or another they just didn't get adopted widely or at all. A couple examples of these: Getopt::Helpful, Getopt::Fancy.

Modules which use Moose, except MooseX::Getopt, automatically get skipped by me because their applicability is severely limited by the high number of dependencies and high startup overhead (200-500ms or even more on slower computers). These include: Getopt::Flex, Getopt::Alt, Getopt::Chain.

Some others are simply too weird or high in "WTF number", but I won't name names here.

Except for App::Cmd and App::Spec, I haven't really touched CLI frameworks in general. There are no shortages of CLI frameworks on CPAN too, perhaps for another series?

I've avoided reviewing my own modules, which include Getopt::Long::Complete (Getopt::Long wrapper which adds tab completion), Getopt::Long::Subcommand (Getopt::Long wrapper, with support for subcommands), Getopt::Long::More (my most recent Getopt::Long wrapper which adds tab completion and other features), Getopt::Long::Less & Getopt::Long::EvenLess (two leaner versions of Getopt::Long for the specific goal of reducing startup overhead), Getopt::Panjang (a break from Getopt::Long interface compatibility to explore new possibilities), and a CLI framework Perinci::CmdLine (which currently uses Getopt::Long but plans to switch backend in the long run; I've written a whole series of tutorial posts for this module).

In general, I'd say that you should probably try to stick with Getopt::Long first. As far as option parsing is concerned, it's packed with features already, and it has the advantage of being a core module. But as soon as you want: automatic autohelp/automessage generation, subcommand, tab completion then you should begin looking elsewhere.

Unfortunately except for evaluating Perl ports of some option parsing libraries (like Smart::Options, Getopt::ArgParse, Getopt::Kingpin), I haven't got the chance to deeply look into how option parsing is done in other languages. Among the other languages is Perl's own sister Perl 6, which offers built-in command-line option parsing. This endeavor of researching option parsing in other languages could potentially offer more lessons and perspectives.

I hope this series is of use to some people. Merry christmas and happy holidays to everybody.

Getopt modules 23: Getopt::Complete

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

Getopt::Complete (GC) is a module written by Scott Smith (SAKOHT) in 2009 and also co-maintained by Nathan Nutter (NNUTTER). Last release is in 2011. So far it registers one CPAN distribution depending on it, although it's written by Scott himself.

Shell tab completion is a topic which I have been interested in since around 2012. I've released numerous modules related to completion, including two option parsing modules Getopt::Long::Complete (GLC) and Getopt::Long::More (GLM) which sports completion as (one of) its selling point, so it's natural that I want to compare them to Getopt::Complete. Throughout the article I'll be repeatedly doing those comparisons, and I hope it's not becoming too annoying.

Interface

GC, like GLC and GLM, is a Getopt::Long (GL) wrapper that adds tab completion feature. To let the module detect tab completion mode and return completion answer as soon as possible, GC offers this interface:

use Getopt::Complete (
    'frog'        => ['ribbit','urp','ugh'],
    'fraggle'     => sub { return ['rock','roll'] },
    'quiet!'      => undef,
    'name'        => undef,
    'age=n'       => undef,
    'outfile=s@'  => 'files',
    'outdir'      => 'directories',
    'runthis'     => 'commands',
    'username'    => 'users',
    ''          => 'directories',
);

That is, it accepts the options specification as import arguments. This looks simple but presents its own inconveniences.

The second thing you'll notice that the options specification are different than GL. While GLC and GLM choose to use an interface that is backward-compatible with GL, GC focuses on tab completion. The values of the pairs in the options specification is not a variable reference/coderef as you would expect in GL, but solely completion specification: it's either undef (meaning the option does not require argument), a string (meaning a completion type/routine to use, e.g. files to complete from filenames, commands to complete from program names in PATH, and so on. The options values themselves are collected in %ARGS.

Thus, compared to GLC and GLM, specifying completion routines is simpler in GC (but I also wrote Shell::Completer to provide the same level of convenience with more flexibility).

Activating Completion

To activate completion in bash, you need to declare this shell function first:

function _getopt_complete () {
COMPREPLY=($( COMP_CWORD=$COMP_CWORD perl `which ${COMP_WORDS[0]}` ${COMP_WORDS[@]:0} ));
}

then for each CLI application you also need to do:

% complete -F _getopt_complete myapp

This is different than the way you activate completion for GLC- or GLM-based scripts:

% complete -C myapp myapp

External programs receive raw COMP_LINE and COMP_POINT environment variables from bash when doing tab completion, while shell functions are provided with the already-parsed command-line COMP_WORDS array variable and COMP_CWORD. GC wants to avoid parsing the command-line on its own, so the _getopt_complete function is used to give the Perl program parsed command-line arguments in @ARGV, and COMP_CWORD in another environment variable.

Using command-line that is already parsed by bash in COMP_WORDS has its pros as well as cons, due to the way that bash parses command-line for COMP_WORDS. So I cannot say which way is better, but what I can say is parsing COMP_LINE ourselves is more flexible.

Completion behavior and bugs

When you press tab after the command:

% myapp <tab>

GC offers only completion from the <> specification. In the above example, it only offers list of directories as answer. On the other hand, GLC and GLM also shows the list of available option names. With GC, to list the available options, you have to do:

% myapp -<tab>

I also cannot say that GLC's and GLM's way is better, but it certainly makes the CLI program more discoverable. By just pressing Tab, a user (especially a new user) can know more about what's possible.

GC has still a few problems. First of all, it cannot complete "–opt=" when COMP_WORDBREAKS contains "=". I have put workarounds for this issue in GLC and GLM. Second, it cannot handle filenames/directory names with spaces, or quotes, and probably other special characters too.

Third, GLC and GLM through Complete::Util offers some matching algorithms aside from simple prefix matching, for extra convenience. This is not offered by GC.

Getopt modules 22: Getopt::Kingpin

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

Getopt::Kingpin is a port of Go's kingpin library, written by Masaaki Takasago (TAKASAGO) in 2016. It offers the usual "nowadays standard" features like: short and long options with short option bundling, automatic help/usage message generation, specifying that an option is required, default value, and subcommands. Two extra features are: specifying that an option can be set via environment variable of a certain name, and built-in completion (which is a feature from the original library but doesn't seem to be implemented yet in the Perl port). The Go library also allows templating of help message, and this is not yet supported by Getopt::Kingpin.

Like Smart::Options (reviewed a couple of days ago), kingpin is using the so-called "fluent style" interface, a.k.a. chained methods, which I find annoying to type in Perl due to the method call operator in Perl being -> instead of a single dot. Although fortunately the chained methods interface is slightly less annoying than in Smart::Options.

After looking at the 3 ports of option parsing libraries (the abovementioned two plus Getopt::ArgParse reviewed yesterday) it indeed seems that subcommand support is becoming a standard thing. Which makes me think about whether Getopt::Long should also add such feature, or whether we should promote some other option parsing library as the "best practice" when one wants to do subcommands. So far, I'm not seeing any single best candidate for "Getopt::Long + subcommand support".

Getopt modules 21: Getopt::ArgParse

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

In contrast to in Perl, where the core modules Getopt::Std and Getopt::Long stand the test of time and remain the most popular ways people parse command-line options with in their Perl CLI scripts, in Python we encounter several churns of recommended standard modules.

First there is getopt, "C-style parser for command line options". To use getopt, you pass a string containing list of short options a la Getopt::Std, e.g. "ho:v" (meaning -o takes argument while h and v are flag switches), and also an array containing long options, e.g. ["help", "output="] (meaning –output takes argument while –help does not). But, instead of supplying references to variables to set, or coderefs (remember, specifying anonymous function is inconvenient in Python) like in Getopt::Long, in getopt programmers are asked to do a manual if-then-else and a loop (see the linked documentation for example). This is also quite similar in interface to the GetoptLong class in Ruby.

No doubt, this style of programming feels manual and tedious. Thus came optparse which is more OO and supposedly more Pythonic. Instead of passing a whole list of options at once, you now add one option (object) at a time using add_option method, along with more information for each option: usage/help message, type, whether the option is required, number of arguments expected, default value, and perhaps some callback. optparse's capability is equivalent to Getopt::Long or Getopt::Long::Descriptive, except that optparse makes some design choices, for example it is decidedly Unix-oriented, allowing only or as the option prefix (while Getopt::Long allows you to configure this). The documentation is quite probably the nicest aspect of this module: it does not assume much knowledge (like familiarity with Unix or CLI) from the readers and explains at length what an option is and how should one design a CLI program with regards to accepting options. I realized that "required option" is indeed an oxymoron from reading it!

But, as with Getopt::Long, optparse does not have the concept of subcommands. Thus arrived argparse. It is basically like optparse in appearance, except it has some extra features like the ability to specify positional arguments (in Getopt::Long, this is handled by the <> option specification) and support nested subcommands with the use of subparsers. Interestingly, argparse supports reading arguments from a file just like Getopt::ArgvFile, and this is the only form of "config file" it supports.

As things are right now, argparse becomes part of the standard library (a.k.a. core modules, in Perl parlance) while optparse is now deprecated and might be removed. However, getopt remains.

There is a Perl port of argparse on CPAN called Getopt::ArgParse, created by M ytraM (MYTRAM) in 2013 and last updated in 2015. It is not feature-by-feature equivalent to its Python original, because of language differences and because argparse still accumulates features over time. You get some basic features like autohelp/autousage message, default value, setting an option as required, setting number of expected arguments, as well as subparsers for subcommand support (although not yet nested in Getopt::ArgParse). The type/validation feature is weak or almost nonexistent; perhaps a custom validation routine should be allowed to be specified or more can be explored here.

What's rather disappointing from this port is its use of Getopt::Long (I was expecting a full port so option parsing should be done by itself) and Moo, significantly adding dependencies.

There is mention of configuration file in the documentation, but actually there is no explicit support of configuration file. Not even using "option file prefix" ala argparse or Getopt::ArgvFile.

All in all, I'm not seeing something to make me prefer this module. If you do not use subcommands, I recommend sticking with Getopt::Long or Getopt::Long::Descriptive. If you do use subcommands, perhaps also consider a CLI framework like App::Cmd, or Getopt::Long::Subcommand.