Getopt modules 14: MooseX::Getopt

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

After you parse command-line options, you usually end up with a hash of option names and values (or something similar). These options are then passed to other parts, e.g. to function as arguments or to object constructors. Some modules exist to let you map directly between command-line options and these other entities. I’ll review one today, MooseX::Getopt, and get to a couple others in the following days.

MooseX::Getopt is a role to let you set your Moose object’s attributes from command-line options. It uses Getopt::Long::Descriptive (which in turn uses Getopt::Long) as the options parser, but you don’t need to explicitly specify an options specification: the role will instead figure out the names of options and various rules for you as much as possible.

It’s first released in 2007 by the original author of Moose, Stevan Little (STEVAN), not too long after Moose is released in 2006. Over the years, some people have maintained it and since 2012 Karen Etheridge (ETHER) is the primary maintainer. Karen currently also maintains a lot of other modules in the Moose ecosystem.

About 94 CPAN distributions depend on this module, making it possibly the most popular way to quickly create CLI script from a (Moose-based) class. Meanwhile, for creating CLI scripts with subcommands, App::Cmd (together with MooseX::App::Cmd) looks to be the most popular way.

To use MooseX::Getopt, you include this role in your class. Then, instead of using new to construct your object, you use new_with_options instead.

package My::App;
use Moose;

with 'MooseX::Getopt';

has 'foo' => (is => 'rw', isa => 'Str', required => 1);
has 'bar' => (is => 'rw', isa => 'Int', default => 10);
has 'baz' => (is => 'rw', isa => 'ArrayRef', documentation => 'one or more files');

Your CLI script is simply something like:

#!perl
use My::App;
my $app = My::App->new_with_options;
# perhaps do something with the $app

When you call your CLI script:

% myapp --foo blah --baz a --baz b

your object will have its foo set to "blah", bar set to the default 10, and baz set to ["a", "b"].

If you do not specify option for required attributes (like not specifying --foo), or specify special option --usage or --help, then an automatically generated usage message will be printed. The usage message will use the attributes’ documentation option:

% myapp -h
usage: myapp [-?h] [long options...]
        -h -? --usage --help  Prints this usage information.
        --foo STR
        --bar STR
        --baz STR...          one or more files

A --version handler is also provided:

% myapp --version
/home/u1/test/moosex-getopt/myapp
(Getopt::Long::GetOptions version 2.48; Perl version 5.24.0)

Some types like Str, Float, and Int can be mapped easily into Getopt::Long option specification dest type, respectively --opt=s, --opt=f, and --opt=i. Arrayrefs and hashrefs are also mapped to =s@ and =s%. For other types, you can provide a mapping between Moose type and Getopt::Long specification (like mapping between ArrayOfInts with =i@ shown in the documentation). Since Moose also supports coercion, this also makes it possible to do something like:

% myapp --since '2016-01-01'

and your object’s attribute will become a DateTime object.

If you want config file support, there’s a separate role MooseX::ConfigFromFile
or MooseX::SimpleConfig. The latter combines MooseX::ConfigFromFile with Config::Any so you can read any configuration format that Config::Any supports (which includes INI, JSON, YAML, Apache-style, or Perl code). MooseX::Getopt supports these kinds of roles so all you have to do is include them into your class, then you can do:

% myapp -h --configfile /etc/myapp.yaml --other-opts=val

Default location of config files can also be set. And you can control the mapping of attributes to options, for example there’s a variant MooseX::Getopt::Strict which only creates command-line options for object attributes that have the “Getopt” attribute. The default is to provide all non-private attributes with their options.

In short, this module is DRY, DWIM, simple to use so it’s hard to complain about. Of course, normally I wouldn’t use a startup-heavy Moose object for a CLI script but choose a more lightweight object system or don’t use objects at all. And my nitpick is that it doesn’t translate underscore to dash, e.g. your attribute foo_bar becomes --foo_bar instead of --foo-bar but this is a matter of personal preference.

Advertisement

Getopt modules 13: App::Options

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

In the previous article, I reviewed Getopt::ArgvFile which enables you to read options from config file. There's another module which used to be my favorite when it comes to reading options from command-line as well as config files and environment: App::Options.

App::Options is a module by Stephen Adkins (SPADKINS). The first release is in 2004 and the latest release as of this writing is version 1.12 in 2010. Unlike Getopt::ArgvFile which must be combined with Getopt::Long, App::Options does it all alone.

Modules like App::Options are terribly convenient. You just need one specification of list of options you want to accept and the module will parse the values from various sources, in specific order like you would expect in a typical Unix program. You are then presented with the final result in a hash. In the case of App::Options, the final hash is %App::options.

Here's an example of how to use App::Options, taken from one of my scripts called phpbb3-post:

use App::Options (
    option => {
        base_url => { type => 'string', required => 1, description => 'Address of phpBB3 site'},
        username => { type => 'string', required => 1, description => 'Username to login to phpBB3 site'},
        password => { type => 'string', required => 1, description => 'Password to login to phpBB3 site'},

        forum_id => { type => 'int', required => 1, },
        topic_id => { type => 'int', required => 0, },
        delay => { type => 'int', required => 0, default => 4, description => 'Number of seconds to wait between posting'},
        bbcode => { type => 'bool', required => 0, default => 0, description => 'Whether to interpret BBCode'},
        log_level => { type => 'string', required => 0, default => 'DEBUG' },
        obfuscate_link => { type => 'bool', required => 0, default => 0, },
    },
);

If you run the script:

% phpbb3-post –help
Error: "base_url" is a required option but is not defined
Error: "forum_id" is a required option but is not defined
Error: "password" is a required option but is not defined
Error: "username" is a required option but is not defined
Usage: phpbb3-post [options] [args]
–help print this message (also -?)
–base_url=<value> [undef] (string) Address of phpBB3 site
–bbcode=<value> [0] (bool) Whether to interpret BBCode
–delay=<value> [4] (int) Number of seconds to wait between posting
–forum_id=<value> [undef] (int)
–log_level=<value> [DEBUG] (string)
–obfuscate_link=<value> [0] (bool)
–password=<value> [undef] (string) Password to login to phpBB3 site
–topic_id=<value> [undef] (int)
–username=<value> [undef] (string) Username to login to phpBB3 site

The values of the options can be specified directly from the command-line, e.g.:

% phpbb3-post –base_url=https://example.com/forum/ –username=foo –password=secret \
–forum_id=10 < post.txt

or, some of them can be stored in a configuration file (like password, which is not apt to be specified via command-line). App::Options searches config files in several locations, from per-user .app directory ($HOME/.app/PROG_NAME.conf, $HOME/.app/app.conf), to program directory ($PROG_DIR/PROG_NAME.conf, $PROG_DIR/app.conf), until global directory /etc/app/app.conf. The location of configuration file can be changed via –option_file command-line option or disabled via –no_option_file.

The configuration file is INI-like but with some differences. There is a concept of config profiles to let you store multiple sets of options in a single file which can be selected via –profile. For example:

[profile=site1]
site=https://SITE1/
username=USER1
password=PASS1

[profile=site2]
site=https://SITE2/
username=USER2
password=PASS2

However, the module has its own peculiarities that over the years finally made me develop my own solution to replace it.

First of all, it encourages putting the specification in the use statement at compile-time phase, which interferes with perl -c. You suddenly cannot check your script simply with perl -c YOURSCRIPT anymore, as options checking is still done and you'll still need to provide all the required options or supply a config file. This can be remedied by changing the code from:

use App::Options (option => { ... });

into:

use App::Options ();
App::Options->import(option => { ... });

which delays the option parsing to the runtime phase. This is not documented though (instead the documentation seems to be out-of-date and mentions the init() method which no longer exists in the code).

Second, you need to use this syntax to provide an option value on the command-line:

–name=VALUE

The more common syntax:

–name VALUE

is not accepted and the error message is not clear when you do this.

The third is more minor and purely personal preference: I prefer the option like foo_bar to become –foo-bar on the command-line (that is, the underscores become dashes). Or at least support both –foo-bar and –foo_bar. App::Options only supports the later.

The fourth: I don't like the order that App::Options searches configuration files. Since I deploy applications as Perl distributions, I'd much prefer the config files to be in the usual $HOME/.config/ or $HOME or finally /etc. I'd hate my applications to become "special" or "different" and need to have their configs put in $HOME/.app or /etc/app. The ordering is currently fixed and cannot be customized.

My Perinci::CmdLine steals a few ideas from App::Options, mainly the INI-like configuration format and the concept of config profiles. But with all my itches scratched. You can also try Smart::Options (to be reviewed later) which also supports config files, plus subcommands.

Getopt modules 12: Getopt::ArgvFile

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

Receiving options from command-line arguments is usually just one of a few ways a CLI program gets its input. Two of the other most popular ways are: environment variables (e.g. PERL_CPANM_OPT or perl's own PERL5OPT) and configuration files (e.g. /etc/wgetrc or dzil.ini). These alternative ways are used when command-line arguments are less appropriate, e.g. when there are lots of options which would make inputting them all to command-line arguments cumbersome or tedious. Another case is when the option contains something sensitive like login name/password, which when specified via command-line makes it easily peekable by other users via a simple `ps ax`.

Environment variables with name like FOO_OPT usually just extend the command-line arguments. You can put command-line options here like you would into command-line arguments. If you use configuration files there's the additional issue of what format to write the configuration in, and which locations to search the configuration files in. A simple choice is to also assume the configuration files to contain just a bunch of command-line options, separable by newlines. This way, you don't have to invent a new format or create additional mapping between configuration parameters and command-line options. This method is used by various programs large and small, such as: curl, or mplayer and mpv.

If you use perl and Getopt::Long (or a Getopt::Long wrapper like Getopt::Long::Descriptive) you can use Getopt::ArgvFile to add this config file reading capability.

Getopt::ArgvFile is a module first released by Jochen Stenzel (JSTENZEL) in 1999 (but the copyright message hints that it was first written in 1993). Last update is in 2007. It currently has 20 CPAN distributions depending on it, and looks like Barbie (BARBIE) and Kathryn Andersen (RUBYKAT) are two of its prominent users. The module seems to be well received by the community, if judging from its CPAN Ratings reviews.























phasereldistauthordist_versionreq_version
runtimerequiresBencher-Scenario-GetoptModulesPERLANCAR0.040
runtimerequiresCPAN-Testers-Data-GeneratorBARBIE1.210
runtimerequiresCPAN-Testers-Data-Uploads-MailerBARBIE0.060
runtimerequiresCPAN-Testers-WWW-DevelopmentBARBIE2.110
runtimerequiresCPAN-Testers-WWW-Reports-MailerBARBIE0.370
runtimerequiresCPAN-Testers-WWW-StatisticsBARBIE1.210
runtimerequiresCPAN-Testers-WWW-Statistics-ExcelBARBIE0.060
runtimerequiresDist-Inktly-MintyTOBYINK0.0020
runtimerequiresGraphics-ColoursetRUBYKAT0.021.09
runtimerequiresModule-DevAidRUBYKAT0.241.1
runtimerequiresModule-Package-RDFTOBYINK0.0140
runtimerequiresPAR-PackerRSCHUPP1.0351.07
runtimerequiresPerlPoint-ConvertersLDOMKE1.02051.01
runtimerequiresPod-PerlPointJSTENZEL0.061.06
runtimerequiresSQLite-WorkRUBYKAT0.16010
runtimerequiresTie-FieldValsRUBYKAT0.62031.08
runtimerequiresX11-MuralisRUBYKAT0.10020
runtimerequireshtml2dbkRUBYKAT0.03011.09
runtimerequireskhatgalleryRUBYKAT0.031.09
runtimerequirespod2pdfJONALLEN0.420

As previously stated, this module is not an option parsing module or a Getopt::Long wrapper, but its companion instead. To use it, you put:

use Getopt::ArgvFile;

before:

use Getopt::Long;
GetOptions(...);

This will make Getopt::ArgvFile scan for @-prefixed arguments like @myapp.conf
in command-line arguments and put the command-line options found in the contents of myapp.conf file into @ARGV. By the time GetOptions() is run, it sees the command-line options from the config file myapp.conf already inserted into @ARGV. Example, if myapp.conf contains:

–opt1=foo –opt2=bar
# a comment
–opt3=baz
–opt3 qux

Then this command-line arguments:

% myapp @myapp.conf –opt1 quux –no-opt4

will produce this @ARGV:

["--opt1=foo", "--opt2=bar", "--opt3=baz", "--opt3", "qux", "--opt1", "quux", "--no-opt4"]

The config file can contain shell-style comment (# blah) or even POD. It can contain another @file to include other files.

Since the @file syntax is not commonly used, you can also configure Getopt::ArgvFile to scan for a special option to specify config file instead (like –config, as commonly used by many programs).

Getopt::ArgvFile can also be instructed to look for config files in some common locations like the script's path (using import option default => 1), home directory (home => 1), or current directory (current => 1). The config file name can be configured too, with multiple names if necessary.

In short, Getopt::ArgvFile is a quick and convenient way to add config file reading support to your application, but the following are some comments about the module:

First, when a specified config file cannot be read (permission denied, etc), there is no warning or error message whatsoever. This is my main complaint.

There is no equivalent for –no-config special option to disable config file reading. Adding this option to GetOptions is also problematic, as Getopt::ArgvFile works before Getopt::Long. But if you use autohelp/autousage like Getopt::Long::Descriptive, you might want to add –config and –no-config too so they are documented.

The separated two-step approach also comes with its problem. For example, if you specify a command-line option –foo @bar, wanting the foo option to contain the value @bar, you can't because @bar will already have been stripped by Getopt::ArgvFile.

The default settings show the module's age. Finding config files in script path (default => 1) or current directory (current => 1) is not considered proper nowadays. There is also currently no way to disable parsing of and +
option prefixes (like -config or +config) by Getopt::ArgvFile, although this can be implemented pretty trivially, e.g. by looking at Getopt::Long's configuration (but this requires Getopt::Long to be loaded first).

All these considerations make me prefer something more integrated, like Perinci::CmdLine or App::Options (although those two modules happen to use configuration files in INI/INI-like format instead of raw command-line options).

Getopt modules 11: Getopt::Lucid

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

Getopt::Lucid is David Golden's (DAGOLDEN) take on option parsing. It was first released in 2005 and still sees updates from time to time. The last release is in Aug 2016. Around 20 CPAN distributions use it (some of them are David's), making it moderately popular (behind Getopt::Long::Descriptive or things like MooseX::Getopt and MooX::Options).






















phasereldistauthordist_versionreq_version
runtimerequiresAcme-PDF-rescaleHERVE0.20.16
runtimerequiresApp-CPAN-Mini-VisitDAGOLDEN0.0080.16
runtimerequiresApp-StaticImageGalleryRBO0.0020
runtimerequiresApp-TaskflowFARHAD1.00
runtimerequiresApp-Ylastic-CostAgentDAGOLDEN0.0060
runtimerequiresApp-ZapziRUPERTL0.0171.05
runtimerequiresApp-grindperlDAGOLDEN0.0040
runtimerequiresApp-mymeta_requiresDAGOLDEN0.0060
runtimerequiresBencher-Scenario-GetoptModulesPERLANCAR0.040
runtimerequiresDeclMICHAEL0.110
runtimerequiresPaludis-UseCleanerKENTNL0.010003070
runtimerequiresPod-ROBODocMGRIMM0.30
runtimerequiresPod-WikiDocDAGOLDEN0.200.14
runtimerequiresString-DumpPATCH0.090
runtimerequiresTask-BeLike-DAGOLDENDAGOLDEN1.0100
runtimerequiresTask-BeLike-RJRAYRJRAY0.0090
runtimerequiresTask-MasteringPerlBDFOY1.0020
runtimerequiresVJF-EmphaseHERVE0.110.16
runtimerequiresVJF-MITDTHERVE1.010.16

Getopt::Lucid implements its own parsing and does not depend on Getopt::Long. It presents a more OO interface and avoids using symbols like Getopt::Long's :s
or =s@. Compared to Smart::Options which also uses method chaining style, I find Getopt::Lucid clearer because the method chains are done on a per-option basis instead of for the whole options set.

Compared to Getopt::Long, Getopt::Lucid supports: specifying an option's default value, specifying that an option is required, and specifying extra validation rule for each option. It does not allow specifying per-option summary string for nicer generation of usage message, although with its interface, adding such feature should be easily done. It does not offer automatic –help or –version message. It does not support auto-abbreviation.

Getopt::Lucid also allows you to express dependencies between options, a feature not usually found in other option parsing modules. Although currently the dependency relationship supported is just one form: Param('foo')->needs('bar')
means that when –foo is specified then –bar also needs to appear. There is no support to express mutual exclusiveness, for example.

Getopt::Lucid comes with its own peculiarities. For example, a long option can be specified as –foo but can also as foo. I haven't found a Unix program that behaves like this. I also notice that it does not allow an option named -?. And, case-sensitivity of option is regulated on a per-option basis.

There is no built-in support for config files, but the documentation shows how to do it. Basically, Getopt::Lucid wants to allow the user to customize how values from config files should be merged with values from the command-line options.

Conclusion: The ability to express dependency between options is useful, especially if the other dependency relationships (like mutual exclusiveness) were supported. Otherwise, I'd probably still reach for modules that allow automatic generation of usage/help/version message.

Getopt modules 10: Getopt::Auto

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

Getopt::Auto is a module that was first written in 2003 by Simon Cozens (SIMON) then revived in 2011, 2014 by Geoffrey Leach (GLEACH). There is currently only one CPAN distribution depending on this module.

% lcpan rdeps Getopt::Auto
+---------+----------+--------------+--------+--------------+-------------+
| phase   | rel      | dist         | author | dist_version | req_version |
+---------+----------+--------------+--------+--------------+-------------+
| runtime | requires | Pod-HtmlEasy | GLEACH | v1.1.11      | 1.009006    |
+---------+----------+--------------+--------+--------------+-------------+

It does not require you to write an options spec (but you can, if you want to). You can just do:

use Getopt::Auto;

and your command-line options will be collected in main‘s %options. You can also define option subroutine named like the option, e.g. foo() which will be called when the option is encountered. This is similar to supplying an option handler coderef in Getopt::Long, except in a non-global-clean way.

Of course, when not using a spec there’s the usual issue of syntax ambiguity. Getopt::Auto resolves this by assuming that all options in the form of --foo
are flag options that do not take values, and options that want to take value must use the --foo=value syntax.

If you want to specify a spec, you can, in the form of POD (like Getopt::Euclid). However, unlike Getopt::Euclid, the rules are far fewer and simpler. You just provide a header (it can be =head2, =head3, or some other level) in this format:

=head2 -a, --add - Some summary

And that’s it.

There’s no way of specifying that an option always/never takes value, or whether an option is required, or its default value.

Getopt::Auto offers another way of supplying the spec: via import argument. For example:

use Getopt::Auto ([
    ['--add', 'Add integer to count', "The integer argument is not checked.\n", \&add],
    ['-a', 'Add integer to count', "The integer argument is not checked.\n", \&add],
    ['--inc', 'Bump count by 1', undef, undef],
]);

This way, you can supply option subroutine in a global-clean way. But for some reason, when it accepts spec this way, you must specify each alias separately. You can’t say “-a, –add” like you can in POD. This should probably be rectified in the future.

Another peculiarity is that option subroutine must accept argument in $ARGV
instead of $_. Another non-global-clean way.

Verdict: Getopt::Auto had some potential but is still plagued with enough eccentricities that prevent me from recommending it for general use.

Getopt modules 09: Getopt::Declare, Getopt::Euclid, Docopt

About this mini-article series. Each day for 24 days, I will be reviewing a module (but 3 modules for today) that parse command-line options (such modules are usually under the Getopt::* namespace). First article is here.

In 2012, the Python option parsing library docopt made its first appearance and took the world by storm, so to speak. It was regarded by many as a fresh approach that is quite revolutionary. Ports for other languages follow, from PHP to Ruby, Haskell to Go, C/C++ to Rust. The npm port itself boasts 500 dependents. Docopt also eventually inspires a few forks or projects that aim to extend the expressive power of the DSL. Oh, and a Perl port exists too of course, written by Tokuhiro Matsuno (TOKUHIROM). Sadly, Docopt.pm still has the “under development” label from the author and hasn’t been updated since 2013, although it already works at least for a subset of specification.

Meanwhile, the concept is not new nor invented in the Python community. As far back as 1998 (14 years earlier), Damian Conway (DCONWAY) released Getopt::Declare on CPAN which has the same basic idea: parse options based on a documentation-like or usage-message-like specification. And I suspect the Perl folks in turn stole this concept from some other even older language.

In 2005, the Perl Best Practices book (also by Damian Conway) came out. In it, a then-yet-unwritten new module called Getopt::Clade is mentioned and is supposed to be the blessed successor to Getopt::Declare. Unfortunately, that module never got written. But Getopt::Euclid was born instead, with a similar but slightly different concept: instead of using a usage-message-like specification, the specification is read from the POD.

How popular are these modules? None of them are very, but at least all of them have some CPAN distributions depending on them, unlike most other Getopt:: modules. Docopt has 5 CPAN distributions depending on it, Getopt::Declare only has 4, while Getopt::Euclid is slightly better at 10.

% lcpan rdeps Docopt
+---------+----------+-----------------------+---------+--------------+-------------+
| phase   | rel      | dist                  | author  | dist_version | req_version |
+---------+----------+-----------------------+---------+--------------+-------------+
| runtime | requires | App-ReorderGoProFiles | VTI     | 0.02         | 0           |
| runtime | requires | App-WhatTimeIsIt      | BAYASHI | 0.01         | 0.03        |
| runtime | requires | App-plmetrics         | BAYASHI | 0.06         | 0           |
| runtime | requires | CLI-Dispatch-Docopt   | BAYASHI | 0.01         | 0.03        |
| runtime | requires | Devel-Mutator         | VTI     | 0.03         | 0           |
+---------+----------+-----------------------+---------+--------------+-------------+
% lcpan rdeps Getopt::Declare
+---------+----------+-----------------------+----------+--------------+-------------+
| phase   | rel      | dist                  | author   | dist_version | req_version |
+---------+----------+-----------------------+----------+--------------+-------------+
| runtime | requires | Finnigan              | SELKOVJR | 0.0206       | 1.13        |
| runtime | requires | MKDoc-Text-Structured | BPOSTLE  | 0.83         |             |
| runtime | requires | SVN-Churn             | RCLAMP   | 0.02         | 0           |
| runtime | requires | Task-MasteringPerl    | BDFOY    | 1.002        | 0           |
+---------+----------+-----------------------+----------+--------------+-------------+
% lcpan rdeps Getopt::Euclid
+---------+----------+------------------------------+-----------+--------------+-------------+
| phase   | rel      | dist                         | author    | dist_version | req_version |
+---------+----------+------------------------------+-----------+--------------+-------------+
| runtime | requires | Audio-MPD                    | JQUELIN   | 2.004        | 0           |
| runtime | requires | Games-RailRoad               | JQUELIN   | 1.101330     | 0           |
| runtime | requires | MARC-Record-Stats            | CRUSOE    | v0.0.4       | 0           |
| runtime | requires | Module-Install-PodFromEuclid | FANGLY    | 0.01         | 0.3.4       |
| runtime | requires | NetHack-PriceID              | SARTAK    | 0.05         | 0           |
| runtime | requires | Task-Cpanel-Internal         | CPANEL    | 11.36.001    | 0           |
| runtime | requires | Test-Approvals               | JRCOUNTS  | v0.0.5       | 0           |
| runtime | requires | VSGDR-StaticData             | DEDMEDVED | 0.31         | 0           |
| runtime | requires | VSGDR-TestScriptGen          | DEDMEDVED | 0.16         | 0           |
| runtime | requires | VSGDR-UnitTest-TestSet       | DEDMEDVED | 1.34         | 0           |
+---------+----------+------------------------------+-----------+--------------+-------------+

If I have to pick betwen Getopt::Declare and Getopt::Euclid, I choose Getopt::Euclid. Getopt::Declare uses Tab character to separate specification fields, which is problematic in some editor configurations. The specification is not exactly a usage message. It mixes in some Perl code in { … } blocks, so you cannot directly use it as a usage message (although a single usage() will produce the final usage message for you). And, if you are writing a documentation POD for your CLI program anyway, you might as well use the POD and parse your list your options from it.

But if I have to pick between the abovementioned two and Docopt, I pick Docopt. Using these modules means learning yet another DSL and you might as well learn Docopt’s flavor which has implementations in many languages. The docopt syntax also makes it easy to express dependencies between options in a compact way, e.g. which options must be mutually exclusive (e.g. (--verbose | --debug | --quiet)) and which option depends on the existence of other option.

Buuut, and this is the final but, if I have to pick between the Docopt UI-first approach and the “normal” approach (something like Getopt::Long::Descriptive or App::Options or Getopt::Long::More where a structured specification is used to generate usage/documentation instead of the other way around), I’d pick the latter. True, with Docopt we can tune the exact formatting of the usage message. But I usually prefer my usage message to be generated automatically anyway. Using Perl data structure as the specification is better because the syntax can be checked by your usual IDE (on the other hand, I’m sure someone could create or have created a docopt Emacs mode or something.)

And, unless for simpler scripts, I also usually want an option parser module to have the ability to read configuration files (and environment variables). So far, no such Docopt-style modules have been written. Anyone?

Getopt modules 08: Getopt::Tiny

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

Getopt::Tiny is a module written in 1999-2002 by David Muir Sharnoff (MUIR). David is a veteran Perl programmer who also programs in several other languages like Go, Python, node, Java. This module is admittedly among his earlier works, and the age shows e.g. it only recognizes “old” style long option with a single dash prefix (-foo) instead of “new” one with double dash prefix (--foo). But there are a couple of things I want to highlight.

Getopt::Tiny is not a Getopt::Long wrapper; it implements its own parsing. The code is short and simple, as the name ::Tiny would suggest, at around only 115 lines. It currently does not have any CPAN distributions depending on it.

The feeling that I get looking at this module is that it tries to be less redundant than Getopt::Long. For example, instead of option spec being something like name=s or name=s@ or name=s%, this module guesses the destination type from the type of references the option is paired with:

    opt1 => \$str,  # a scalar/string
    opt2 => \@ary,  # an array
    opt3 => \%hash, # a hash

Getting a bit quirky, when generating usage message, instead of letting user specify a summary string like some other modules (e.g. Getopt::Long::Descriptive, Getopt::Simple, Getopt::Long::More) or by extracting POD like some (e.g. Getopt::Euclid, Getopt::Declare) this module searches from comment in source code. The comment must be particularly formatted, i.e.:

# begin usage info
my %flags = (
    opt1 => \$str,  # description for opt1
    opt2 => \@ary,  # description for opt2
    opt3 => \%hash, # description for opt3
);
my %switches = (
    switch1 => \$switch1,  # description for switch1
);
# end usage info
getopt(\@ARGV, \%flags, \%switches);

Another rather quirky thing, also as the result of trying (a bit too hard) to be compact, the fourth option to its getopt() option is a string that will be used in the usage message for symbolizing the arguments, e.g. “files”, which will generate a usage message e.g.:

Usage: myprog [flags] [switches] files

This means, it’s okay if @ARGV contains arguments after the options (flags, switches) have been stripped, e.g.:

% myprog --opt1 val --opt2 val --opt2 val foo bar

But, if the fourth argument is not specified, the usage message is simply:

Usage: myprog [flags] [switches]

And the command-line is not allowed to have extra arguments (foo, bar). On the other hand, there’s no way to express that command-line arguments are required. Or whether a flag is required. Or default value.

But at least the code is compact as well as very straightforward, which is better than some other modules that I looked at.

Getopt modules 07: Getopt::Std::Strict

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

Getopt::Std::Strict is a module written by Leo Charre (LEOCHARRE) in 2008. The author is using this module in several of his CPAN distributions, but no one else on CPAN seems to be using it. More "recent" option parsing modules, such as those written after the mid-2000's, even including Docopt (port of a popular Python library) do not see much adoption on CPAN. This could be caused by various factors, the speculation of which I don't intend to get into right now. But I can say that there are two exceptions: Getopt::Lucid and Getopt::Long::Descriptive, and this might have something to do with the reputation of their authors.

Back to our module of the day. There are not a lot of wrappers for or "forks" of Getopt::Std, simply due to the fact that the module is so simple already. But the author of Getopt::Std::Strict has an itch to scratch. First, when we are in strict mode, instead of this:

getopts('oif:');

or:

getopts('oif:', \%opts);

one has to do this instead:

use vars qw($opt_o $opt_i $opt_f);
getopts('oif:');

or:

my %opts;
getopts('oif:', \%opts);

which the author probably finds annoying. So he wrote a module to declare the variable(s) for you. And to do this, the option specification needs to be passed/processed at compile time. Thus:

use Getopt::Std::Strict 'oif:', 'opt'; # will declare the $opt_* for you
if ($opt_o) { ... } # so $opt_o can be used without you declaring it

This is rather nice because if you mistype $opt_o to $opt_p (an unknown option), this mistake will be caught at compile-time.

As a bonus, the module also provides you %OPT and opt() function. You can access for example the -o option via $OPT{o}, or via opt('o'). The former won't catch a typo e.g. when you type $OPT{p} but the later will (opt('p')
will die).

I personally find all this rather unnecessary: I probably would just bear the cost of typing my %opts; or my ($opt_o, $opt_i, $opt_f); and be more explicit.

But I have a couple of suggestions. First, it might be nice if the module dies when user specifies unknown option (to be more inline with the "Strict" spirit in the name). And second, the bonuses are additional cognitive burdens that I'd personally do without. Or, if they are to stay, to be more strict the %OPT can be made to die too when fed unknown options too, e.g. using tie mechanism.

Getopt modules 05: Getopt::Valid

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

Getopt::Valid is yet another module that wraps Getopt::Long to try to add some features (examples of modules like these that have been reviewed are Getopt::Long::Descriptive and Getopt::Compact). The module starts with a single specific goal (even the name already reflects that goal nicely), which is to add extended validation. Getopt::Long admittedly only allows for expressing limited validation, e.g. that something needs to be an integer (–name=i) or a floating point number (–name=f). Everything else is not validated, except by ourselves if we assign an option to an option handler (coderef) instead of scalar reference or array/hash reference.

The actual design of Getopt::Valid, however, breaks down in places. The main thing that pops up the most to me is the inconsistency of names, reflecting the lack of design clarity. The top-level structure to be passed to the GetOptionsValid() routine is called $validation_ref, but it’s not exactly a validation specification: it’s the program’s specification with its list of options. The options are called “params” in one place, “arguments” in another places. The collect_argv method is described as a method to collect “args” (arguments). For an option parsing module which must differentiate between option, arguments (before and after the options are parsed), the convoluted naming really turns me off.

The OO interface also leaves something to be desired. After instantiation, first you have to collect_argv() which again is a misleading name because it actually call Getopt::Long::GetOptions() to do the actual parsing. Then we have to manually call validate() to do the extra validation. Then, we have to manuall call another method valid_args() to get the validated arguments, er, options.

The validators themselves can be in the form of a coderef:

    'name=s' => sub { ... }

or for more simpler cases, a regex object:

    'name=s' => qr/^blah.+/,

Fine by me. But then, a third case is allowed which is a string to mean… a description of the option instead of something related to validation as the former two cases.

Getopt modules 06: Opt::Imistic

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

Option parsing modules can be categorized into two: those that require you to write a specification and those that do not (actually there's a third category: those that allow you to choose to supply a spec or not). Modules that accept a usage text as input like Docopt, or POD like Getopt::Euclid, or some other form, count into the first category. Modules in the second category are usually modules in this category are meant for shorter and simpler scripts. (And the modules are often simple themselves with short implementation, too simple to contain something worth babbling about. Most of them just collect anything that looks like an option in @ARGV then put them in a hash, done!)

Modules that don't accept a specification face an ambiguity problem when it comes to this syntax:

–foo bar

Is this a flag option –foo (an option that does not require a value) followed by a command-line argument bar, or an option –foo with its value bar?

Some modules resolve this by disallowing the ambiguous syntax. Option value must always be specified using:

–foo=bar

But this is inconvenient to (some) users and is not how most Unix programs behave.

Other modules resolve this by simply assuming that all –foo bar means option –foo with value of bar. In other words, user must be careful not to put an argument after a flag option, usually using to separate options and arguments).

Aside from the abovementioned two, other approaches are possible. One such approach is by looking at how the option value is used in the program. Using a pragma like overload, we can trap boolean, string, even array/hash operations. For example:

package OptionObject;

use overload
    '""'   => sub { $_[0]{type} = 'scalar'; $_[0]{values}[0] },
    'bool' => sub { $_[0]{type} = 'bool'  ; @{$_[0]{values}} ? 1:0 },
    '@{}'  => sub { $_[0]{type} = 'array' ; $_[0]{values} },
    ;

sub new {
    my $class = shift;
    bless {@_}, $class;
}

Then:

my $opt1 = OptionObject->new(value => ["a", "b"]); # e.g. after user specifies --opt1 a --opt1 b
my $opt2 = OptionObject->new(value => ...);
my $opt3 = OptionObject->new(value => ...);

if ($opt1 =~ /foo/) {
    # this is regex matching, meaning user wants opt1 to be a string/scalar option
}

if ($opt2) {
    # this is boolean testing, meaning user wants opt2 to be a flag option
}

for (@$opt3) {
    # user array-deferences, she probably wants opt3 to be an array option
}

In the above example code, the overloading mechanism will trigger to let us know that user wants –opt1 to be a string/scalar option, –opt2 a flag option (which does not take value), and –opt3 an array option.

CPAN module Opt::Imistic offers something like this approach, although it doesn't push it that far. The module was written by Alastair McGowan-Douglas (ALTREUS) in 2010, and it sees a new release in 2014 and 2015. It does not yet have any CPAN distribution depending on it.

Opt::Imistic tries to solve this other ambiguity that non-spec-using modules also faces. When we receive this in the command-line:

–foo bar –foo baz

sometimes –foo is not meant to accept multiple values, but user might specify the option multiple times due to mistake or some other cause. With a spec, we can detect this. Without spec, Opt::Imistic tries to detect this by looking at how the option value is used by the program.

To use Opt::Imistic to parse command-line options, you do this:

use Opt::Imistic;

This will cause the module to parse @ARGV and put the result in %ARGV. For example, if your script is called like this:

% myapp –foo bar –foo baz –qux 1 2 3

Then you'll have $ARGV{foo} and $ARGV{qux} available for you. All options are assumed to take values. The remaining arguments in @ARGV will be [2, 3].

$ARGV{foo} and $ARGV{qux} are actually overloaded objects. If you use them in a scalar context, then you will get a scalar, for example:

open my $fh, "<", $ARGV{foo}; # here, foo is used in scalar context

then $ARGV{foo} will return the value baz (the last specified). But actually, Opt::Imistic stores all option values as array. If you use it as an array, you can:

for my $file (@{ $ARGV{foo} }) { ... }

If this were combined this with lazy/delayed parsing, the module could even resolve the ambiguous –foo bar syntax. For example, if it didn't parse until it sees:

if ($ARGV{foo}) {
    ...
}

then it would know that –foo is meant to be a flag option and thus does not take value. Then Opt::Imistic could parse options and not slurp bar as the option for –foo. This can be repeated, command-line can be reparsed whenever a new hint is encountered.

I am tempted to write such a proof-of-concept. But in general, I am more interested in option parsing that uses specification. Writing specification is not that much of a pain anyway, and it offers so much more, like the ability to check for unknown options, auto-abbreviation, autogeneration of usage messages, and so on.