Getopt modules 06: Opt::Imistic

About this mini-article series. Each day for 24 days, I will be reviewing a module that parses command-line options (such module is usually under the Getopt::* namespace). First article is here.

Option parsing modules can be categorized into two: those that require you to write a specification and those that do not (actually there's a third category: those that allow you to choose to supply a spec or not). Modules that accept a usage text as input like Docopt, or POD like Getopt::Euclid, or some other form, count into the first category. Modules in the second category are usually modules in this category are meant for shorter and simpler scripts. (And the modules are often simple themselves with short implementation, too simple to contain something worth babbling about. Most of them just collect anything that looks like an option in @ARGV then put them in a hash, done!)

Modules that don't accept a specification face an ambiguity problem when it comes to this syntax:

–foo bar

Is this a flag option –foo (an option that does not require a value) followed by a command-line argument bar, or an option –foo with its value bar?

Some modules resolve this by disallowing the ambiguous syntax. Option value must always be specified using:

–foo=bar

But this is inconvenient to (some) users and is not how most Unix programs behave.

Other modules resolve this by simply assuming that all –foo bar means option –foo with value of bar. In other words, user must be careful not to put an argument after a flag option, usually using to separate options and arguments).

Aside from the abovementioned two, other approaches are possible. One such approach is by looking at how the option value is used in the program. Using a pragma like overload, we can trap boolean, string, even array/hash operations. For example:

package OptionObject;

use overload
    '""'   => sub { $_[0]{type} = 'scalar'; $_[0]{values}[0] },
    'bool' => sub { $_[0]{type} = 'bool'  ; @{$_[0]{values}} ? 1:0 },
    '@{}'  => sub { $_[0]{type} = 'array' ; $_[0]{values} },
    ;

sub new {
    my $class = shift;
    bless {@_}, $class;
}

Then:

my $opt1 = OptionObject->new(value => ["a", "b"]); # e.g. after user specifies --opt1 a --opt1 b
my $opt2 = OptionObject->new(value => ...);
my $opt3 = OptionObject->new(value => ...);

if ($opt1 =~ /foo/) {
    # this is regex matching, meaning user wants opt1 to be a string/scalar option
}

if ($opt2) {
    # this is boolean testing, meaning user wants opt2 to be a flag option
}

for (@$opt3) {
    # user array-deferences, she probably wants opt3 to be an array option
}

In the above example code, the overloading mechanism will trigger to let us know that user wants –opt1 to be a string/scalar option, –opt2 a flag option (which does not take value), and –opt3 an array option.

CPAN module Opt::Imistic offers something like this approach, although it doesn't push it that far. The module was written by Alastair McGowan-Douglas (ALTREUS) in 2010, and it sees a new release in 2014 and 2015. It does not yet have any CPAN distribution depending on it.

Opt::Imistic tries to solve this other ambiguity that non-spec-using modules also faces. When we receive this in the command-line:

–foo bar –foo baz

sometimes –foo is not meant to accept multiple values, but user might specify the option multiple times due to mistake or some other cause. With a spec, we can detect this. Without spec, Opt::Imistic tries to detect this by looking at how the option value is used by the program.

To use Opt::Imistic to parse command-line options, you do this:

use Opt::Imistic;

This will cause the module to parse @ARGV and put the result in %ARGV. For example, if your script is called like this:

% myapp –foo bar –foo baz –qux 1 2 3

Then you'll have $ARGV{foo} and $ARGV{qux} available for you. All options are assumed to take values. The remaining arguments in @ARGV will be [2, 3].

$ARGV{foo} and $ARGV{qux} are actually overloaded objects. If you use them in a scalar context, then you will get a scalar, for example:

open my $fh, "<", $ARGV{foo}; # here, foo is used in scalar context

then $ARGV{foo} will return the value baz (the last specified). But actually, Opt::Imistic stores all option values as array. If you use it as an array, you can:

for my $file (@{ $ARGV{foo} }) { ... }

If this were combined this with lazy/delayed parsing, the module could even resolve the ambiguous –foo bar syntax. For example, if it didn't parse until it sees:

if ($ARGV{foo}) {
    ...
}

then it would know that –foo is meant to be a flag option and thus does not take value. Then Opt::Imistic could parse options and not slurp bar as the option for –foo. This can be repeated, command-line can be reparsed whenever a new hint is encountered.

I am tempted to write such a proof-of-concept. But in general, I am more interested in option parsing that uses specification. Writing specification is not that much of a pain anyway, and it offers so much more, like the ability to check for unknown options, auto-abbreviation, autogeneration of usage messages, and so on.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s