Selecting elements of Org document with CSS-selector-like syntax

If you’ve dabbled with jQuery or CSS selector, or Mojo::DOM (or DOM::Tiny), you’ll find that CSS selector offers a nice way to select elements.

I’ve created Data::CSel to extend this concept not just for HTML element tree, but also for any kind of tree object in Perl. The syntax is similar enough so you should be able to get up and running in no time.

For the first application that uses this, I’ve added the select-org-elements script to App-OrgUtils distribution. This script can select elements of an Org document tree using the CSel language. This should make it easier to select/extract parts of your Org document without having to resort to using Perl code to manipulate the tree.

Example Org document (you can grab it from https://github.com/perlancar/samples/blob/master/org/table.org ):

 emacs determines whether a column mostly contains numbers or non-numbers. if
numbers then a column will be left-justified. if non-numbers then
right-justified.

| col1   |      col2 | col3 | col4     |    col5 |
|--------+-----------+------+----------+---------|
| foo    |      -1.3 |      | abc      |     123 |
| bar    |   -1900.3 |      | abcdefgh | 1500000 |
| baz    |      23.1 |      | 10       |     abc |
| quux   |         0 |      | foo      |     def |
| garply | 3,000,000 |      | 999      |     234 |

table without header:

| one   | two  |
| three | four |

To help see the structure, first use the dump-org-structure script:

% dump-org-structure org/tables.org
Document:
  Text: "emacs determines whether a column mostly..."
  Table:
    TableRow:
      TableCell:
        Text: "col1"
      TableCell:
        Text: "col2"
      TableCell:
        Text: "col3"
      TableCell:
        Text: "col4"
      TableCell:
        Text: "col5"
    TableVLine: "|---\n"
    TableRow:
      TableCell:
        Text: "foo"
      TableCell:
        Text: "-1.3"
      TableCell:
      TableCell:
        Text: "abc"
      TableCell:
        Text: "123"
    TableRow:
      TableCell:
        Text: "bar"
      TableCell:
        Text: "-1900.3"
      TableCell:
      TableCell:
        Text: "abcdefgh"
      TableCell:
        Text: "1500000"
    TableRow:
      TableCell:
        Text: "baz"
      TableCell:
        Text: "23.1"
      TableCell:
      TableCell:
        Text: "10"
      TableCell:
        Text: "abc"
    TableRow:
      TableCell:
        Text: "quux"
      TableCell:
        Text: "0"
      TableCell:
      TableCell:
        Text: "foo"
      TableCell:
        Text: "def"
    TableRow:
      TableCell:
        Text: "garply"
      TableCell:
        Text: "3,000,000"
      TableCell:
      TableCell:
        Text: "999"
      TableCell:
        Text: "234"
  Text: "\ntable without header:\n\n"
  Table:
    TableRow:
      TableCell:
        Text: "one"
      TableCell:
        Text: "two"
    TableRow:
      TableCell:
        Text: "three"
      TableCell:
        Text: "four"

Now let’s select some elements:

% select-org-elements TableRow org/table.org
|col1|col2|col3|col4|col5
|foo|-1.3||abc|123
|bar|-1900.3||abcdefgh|1500000
|baz|23.1||10|abc
|quux|0||foo|def
|garply|3,000,000||999|234
|one|two
|three|four

% select-org-elements TableRow:first org/table.org
|col1|col2|col3|col4|col5

% select-org-elements 'TableCell:nth-of-type(5):last' org/table.org
234

Selecting via Perl code is equally easy:

use Org::Parser;
use Data::CSel qw(csel);

my $doc = Org::Parser->new->parse_file("yourfile.org");
my @headlines = csel({class_prefixes=>['Org::Element']}, "Headline[level=2]", $doc);

Enjoy.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s