What’s next for bash, completion-wise?

Trying out completion feature in several other shells which I don’t use daily–including zsh, tcsh, and fish–I can’t help but comparing them with bash.

IMHO, the last major feature in completion in bash happens in 2009-2010, when bash 4.1 introduces -D option for the “complete” command. This enables fallback/catch-all mechanism like already found in other shells like fish and zsh. When a user requests completion for a command that does not yet have a completion definition, the hook function specified in “complete -D” can execute and find a completion definition somewhere. And the completion can be activated right there and then instead of having to wait for the next command (or after the user logs out and logs in again). A major convenience as completion can be activated or deactivated instantly.

The subsequent major bash versions don’t introduce anything ground-breaking in terms of completion: 4.2 allows us to configure the number of columns used when displaying completion (nice, but not an additional core functionality) and case-map to treat underscore and dash as the same (really convenient, but we can do that ourselves if we want using function or external command backend). 4.3 introduces “-o noquote” and 4.4 introduces “-o nosort” which are just minor.

Completion description. As many bash users who have tasted fish and zsh would agree, I think bash really needs to add the feature of showing description/help text next to each completion answer. This is a major boost for CLI usability. For example, user can see or be reminded of what each command option does instead of having to “man” or open a browser to Google for it.

Menu select. The other popular feature is “menu select” like in zsh (not to be confused with the already existing option “menu-completion” in bash), where after the user presses Tab and is presented with the list of completions, she can use arrow keys to select the completion she wants instead of typing. This is nice but of lesser impact compared to the previous item. A seasoned CLI user would prefer and can complete faster using typing anyway. What I think would be really nifty is incremental matching, where the list of completions is reduced or expanded as the user types. So for example you press “deluser t” and get presented with a list of 30 usernames starting with “t”. You can now type more letters to match fewer of those names until you get the one you want. The list displayed interactively shrinks or reexpands to show only the matching items. The exact detail of how this would work can be tuned to be as comfortable and powerful as possible. What I described just now is actually just a UI (TUI?) improvement of the functionality already present, as when we use tab completion we often do just that, albeit without the interactive list being displayed automatically (we still need to press Tab whenever we want to get the list of completions).

Colors. Fish utilizes colors a lot, for good purpose. For example if you type “ls -” in fish you’ll get a much nicer output compared to in bash. This lets you scan the list faster. It would be nice if we can show colors more in the list of completion in bash.

Adding support for fish, zsh, tcsh in shcompgen

I’ve recently added support for the other three shells (fish, zsh, tcsh) in shcompgen. shcompgen is basically a utility to write those shell commands “complete -C foo foo” or “complete -c foo -l longopt1 –description ‘Add a thing to foo'” for you. It recognizes scripts written using Getopt::Long::Complete, Perinci::CmdLine, and a few others so that you can enable shell tab completion for your scripts.

fish. Enabling tab completion for a command in fish is relatively simple, if inflexible. For each short/long option of a command, you define a separate “complete” command, e.g.:

complete -c man -s k --description "Show apropos information"
complete -rc man -s C --description "Configuration file"
complete -xc man -a 1 --description "Program section"
complete -xc man -a 2 --description "Syscall section"
complete -xc man -a 3 --description "Library section"

This becomes a bit cumbersome for commands that have subcommands (like “apt-get”) or for programs that have peculiar options syntax. It is not possible to just say like in bash:

complete -F somefunc cmd; # delegate completion to a function
complete -C somecmd cmd ; # delegate completion to an external command

So I basically have to generate those “complete” commands for each option. This means that if a program is updated with new/changed/removed options, we will need to update the “complete” commands too.

zsh. Completion in zsh is complex and complicated with lots and lots of options, if not featureful. You can, in theory, use “complete” or “compgen” command like in bash because zsh has “bashcompinit” that (partially) simulates those two bash commands. This enables you to reuse your bash completion definitions in zsh. I tried to do that but didn’t succeed though.

#compdef pmman
autoload bashcompinit
# this is bash-style
complete -C pmman pmman

The commands I type will sometimes complete, but at other times won’t. So I use “compadd” instead, which is the standard way to add completion results in zsh. For example:

#compdef pmman
_pmman() {
_pmman "$@"

tcsh. tcsh lacks a fallback or autoload mechanism (like “complete -D” in bash or similar mechanism in fish and zsh), so activating or deactivating completion for a command requires you to explicitly re-source a definition script or logout + login again.

Tab completion now works in zsh, fish, and tcsh but since I don’t use those shells daily and am not familiar enough with them, there are still known issues (documented in the shcompgen’s POD) like with escaping of special characters like whitespace. I hope that Perl programmers that use one of those shells can give inputs on how to resolve the issues.

Adding tab completion for perlbrew

perlbrew is a command-line utility I’m using quite a bit recently: while developing Bencher feature of benchmarking against multiple perls, for trying out cperl, or just updating to the latest perl release. So I thought it would be nice to add tab completion feature to perlbrew.

The obvious choice (for many people anyway) to write tab completion feature in is bash, but I’m more comfortable with Perl. And besides, there are a few nice completion features in Complete::Util I’d like to use.

The result is App::ShellCompleter::perlbrew. You install it by first installing App::shcompgen from CPAN and then:

% shcompgen init

then install App::ShellCompleter::perlbrew from CPAN.

Some of the things that the completion can do:

Complete subcommands, option names, option values, arguments

For example:

% perlbrew un<tab>

will complete to:

% perlbrew uninstall _

The completion features “word-mode” matching, so you can also do something like this:

% perlbrew i-cp<tab>

and it will complete to:

% perlbrew install-cpanm _

Display the list of available perls to install

% perlbrew install <tab>

The first time you do this, it will take several seconds because the completion script will fetch the list of available perls from “perlbrew available”. After that it should be instantaneous because the completion script caches the result in a temporary file.

Display the list of installed perls

It can also do “char-mode” or “fuzzy” matching for increased convenience. For example, type this:

% perlbrew switch 10<tab>

and it will complete to (assuming you have perl 5.10.1 installed):

% perlbrew switch 5.10.1

Source code

The source code for _perlbrew is about 300 lines and I believe is fairly easy to write.

Embedding code snippets from other modules and DevelopRequires

Aside from use()-ing or require()-ing code from other modules, you can also embed code snippets directly into your module (a single function, variable declaration, or the whole module). I’ve done this several times. For example, one time I need to remove duplicate elements from an array. Instead of using List::MoreUtils, I copy-pasted the uniq function from List::MoreUtils::PP into my module’s source code and add a note about it:

# BEGIN: stolen from List::MoreUtils::PP
sub uniq (@)
    my %seen = ();
    my $k;
    my $seen_undef;
    grep { defined $_ ? not $seen{ $k = $_ }++ : not $seen_undef++ } @_;
# END: stolen from List::MoreUtils::PP

I’ve also made a couple of Dist::Zilla plugins to help me automate this kind of process.

By embedding, you avoid the end user’s cost of having to install List::MoreUtils (which is still a non-core module at this time of writing). You also save a bit of startup/compile time by excluding the rest of the functions that you do not need. Of course, you should only do this in the special cases where you really want to minimize dependencies or startup overhead like in bootstrapping scripts or, often in my case, in tab completion scripts which must give answer fast after user presses the ~Tab~ key. And normally you should only do this for code that is already stable and proven, because the cost of having to update this embedded code is usually greater compared to if you simply depend on another module (a case in point: Module::Install).

Since this kind of code embedding is still a form of dependency (whenever the code in source module is updated, you might want to update the embedded code too), it is a good idea to express this dependency when you package your module as a Perl distribution. The appropriate phase and relationship to use for this kind of dependency is DevelopRequires. Modules listed in DevelopRequires dependency will not be installed when users install your module using a CPAN client, but the dependency will serve as a reminder/note that you still depend on the source module.

Checking if a module is installed (without actually loading it)

One of the easiest ways to check if a module is installed is simply by trying to load it:

if (eval { require Foo::Bar; 1 }) {
    # Foo::Bar is loadable

However, when Foo::Bar happens to be installed, this actually loads the module. Which is not always desirable, for example in the cases of: 1) checking a lot of modules; 2) checking a module which is OS-specific and might not work under your OS when loaded; 3) checking a module which might conflict with another module that is already loaded; 4) wanting to avoid the security implication of executing the module’s code.

Another way to check is by trying to locate the module file by iterating over @INC yourself or using something like Module::Path or Module::Path::More. Those modules search for the module in directories specified in @INC like Perl’s require would:

use Module::Path qw(module_path);
if (module_path "Foo::Bar") {
    # Foo::Bar is available

However, this only works when Foo::Bar is indeed located on the filesystem and does not work when Foo::Bar is loaded using a require hook (coderef or object in @INC), like in a fatpacked or datapacked script. Also, it does not work nicely with other uses of require hooks, like emulating a missing module (lib::filter or lib::disallow).

Perl core module Module::Load::Conditional provides check_install which can handle both the cases of the module file is on the filesystem or the module is retrieved from the require hook:

use Module::Load::Conditional qw(check_install);
if (check_install(module => "Foo::Bar")) {
    # Foo::bar is available

In addition to the above, check_install can also be instructed to check for minimum required version:

unless (check_install(module => "Foo::Bar", version=>"1.23")) {
    # Foo::Bar is not available, or its version is < 1.23

Note that checking version number is not performed by loading the module and reading its $VERSION, but instead by using Module::Metadata which tries to extract the version number from the module’s source code (which might fail on some weird module that obfuscate its $VERSION, but for normal cases should suffice).

I also recently wrote Module::Installed::Tiny which does the same as Module::Load::Conditional‘s check_install but with a bit less code and dependency:

use Module::Installed::Tiny qw(module_installed);
if (module_installed "Foo::Bar") {
    # Foo::Bar is available

Note that check_install nor module_installed does not guarantee that the module will be loaded successfully, as there might be syntax errors in the module’s code or runtime errors when running the code. All the routines do is check that the module’s source code is available.

UPDATE [2016-08-03]: This post is originally about Module::Loadable before I was made aware of Module::Load::Conditional‘s check_install. In the original post I wrote that I hoped I didn’t reinvent the wheel by writing Module::Loadable. I was happily proven wrong🙂

Podcast filenames

Like many of you, I listen to some podcasts. There are various ways people get their episodes, but I do it manually on a PC: browse the podcast’s website and download the MP3 files using the browser or wget or curl. I listen on a variety of devices, including television and car audio which can only get the files via USB flashdisks, so I figure it’s better to organize the files on the PC and transfer them to other devices as needed.

Now there’s this minor (or major, depending on how OCD you are) issue of the various inconsistent ways the podcasters like to name their MP3 files. Me, I’m standardizing on this: each filename should include, in the following order, 1) the podcast name (preferably short, a few letters, initials); 2) the episode number in the form of at least 000, or date in YYYYMMDD format; 3) episode title (one to a few words).

This way, whenever I see a file lying around in some folder in some device I can immediately know which podcast this is and what the episode is all about (because on smartphones it’s usually a pain to move files around). The order and the format of the number/date let the files get sorted nicely (because not all apps can do natural sorting). And the short podcast name/initials will prevent the annoyance of not being able to see the date/title on narrower screens (sometimes an app will scroll the filename horizontally a la stock ticker, but sometimes not).

Oh, and I also stick to lowercase alphanumerical characters and dashes/underscores, avoiding whitespaces or other strange characters, for ease of typing, selecting, and tab-completioning.

Here are some samples of filenames which I will definitely rename:

  • The Secret Emotional Life of Clothes.mp3 (from Invisibilia): spaces in filename, no podcast name, no episode number/date. I’d rename it to invisib-20160722-the_secret_emotional_life_of_clothes.mp3.
  • obm20episode2016320-207_19_162C209.0420PM.mp3 (from One Bad Mother): no title, needless string episode as well as time of day, date not in YYYYMMDD order, also the space got mangled into 20 (probably from %20). I’d rename it to obm-163-when_kids_share_a_room.mp3.
  • ShmanQuestions.mp3 (from Shmanners): no episode number/date. I’d rename it to shmanners-20160722-etiquette_catch_all.mp3. Sometimes the title in the filename doesn’t match the title in the post, so I also correct that.
  • OhNoRossAndCarrie_47_RossAndCarrieRememberTonyAlamoPart1.mp3 (from Oh No, Ross and Carrie!): too long. I’d rename it to: onrac-047-tony_alamo_p1.mp3.

And here are some that are already good enough:

  • sm237_limabeans.mp3 (from Spilled Milk). Although before it reaches episode #10 and #100, it uses one and two digits for the episode number so I pad them with leading zeroes.
  • Sawbones146Tea.mp3 (from Sawbones). All the pieces of information are already there in the desired order, I just need to format and lowercase the filename.

How do you name your podcast files?

Cascade bumping of prerequisite version

Introducing backward-incompatible change to a piece of code, especially if that code has a lot of dependants (i.e. located more upstream in the river, if we’re using the river of CPAN analogy), will cause pain. But sometimes you need or want to do it anyway.

Suppose you have this tree of dependencies:

Aa (0.01)
    Bb (0.01, requires Aa=0)
    Cc (0.01, requires Aa=0)
        Dd (0.01, requires Cc=0)
        Ee (0.01, requires Cc=0)
            Ff (0.01, requires Ee=0)
            Gg (0.01, requires Ee=0)
    Hh (0.01, requires Aa=0.01)

Now a backward-incompatible change is introduced in Aa, and you release Aa 0.02. Let’s say this change happens to affect Bb and Cc but not Hh.

If a system updates Aa to 0.02, suddenly Bb and Cc will break. And Dd, Ee, Ff, Gg will break too because Bb and Cc break. Aa can be updated due to a variety of causes, from manually for testing (like in a CPAN Testers machine) or because user installs something else like say Ii which needs Aa=0.02.

So you now release Bb 0.02 and Cc 0.02 to cope with the changes introduced in Aa. And these updates are not backward-incompatible so users of Bb and Cc can still specify Bb=0 or Cc=0.01. The dependency tree becomes like this:

Aa (0.02)
    Bb (0.02, requires Aa=0.02)
    Cc (0.02, requires Aa=0.02)
        Dd (0.01, requires Cc=0)
        Ee (0.01, requires Cc=0)
            Ff (0.01, requires Ee=0)
            Gg (0.01, requires Ee=0)
    Hh (0.01, requires Aa=0.01)

If a system happens to update just Bb or Cc, Aa will be correctly updated automatically to 0.02.

If a system just updates Aa, the same situation still happens: Bb & Cc will break, Dd, Ee, Ff, Gh will break too because Bb and Cc break.

Now suppose you add a new feature to Ff and release new version of Ff (0.02, requires Ee=0). Even though this does not have to do with backward-incompatible change of A 0.02, breakage might still happen. Let’s say a CPAN Testers machine tries to install Ff 0.02. The machine won’t automatically upgrade Bb and Cc because the specified dependency of Ff 0.02 doesn’t require it too. Now the test will fail when there is a new Aa (0.02) installed but old versions of Bb and Cc.

This is exactly what happens to me a few times, most recently in the case of Data::Sah 0.79. After I released Data::Sah 0.79, and then a couple of weeks after that release some other distributions that do not directly depend on it, some CPAN Testers machines will start reporting failure for these distributions. This is because the machines happen to have the updated Data::Sah but some older direct dependants which break under the new Data::Sah.

So back to our Aa example, to properly induce a cascade update, after we release Bb 0.02 and Cc 0.02, we also need to release Dd and Ee just to bump the prerequisite version of Cc to 0.02 even though Dd and Ee don’t exactly require Cc 0.02 (they can live with Cc 0.01). And repeat the process recursively: update Ff and Gg just to bump prerequisite version of Ee, and so on.

Thus, if a system updates Gg 0.02, Ee will automatically be upgraded to 0.02, Cc automatically upgraded to 0.02, and Aa automatically upgraded to 0.02.

To reiterate: after we introduce a backward-incompatible update to a module, we must update all the direct dependants of that module that are affected by the change, and also recursively update all their dependants just to bump the minimum prerequisite version and force pulling the module’s and direct dependants’ update.

In the case of Data::Sah, this involves hundreds of distributions because Perinci::CmdLine::Lite is a direct dependant that is affected. And Perinci::CmdLine::Lite (via Perinci::CmdLine::Any) is used by many of my App:: distributions. But fortunately, on a production system, Data::Sah typically won’t be updated without Perinci::CmdLine::Lite also being updated.