A friend was bemoaning a few Linux quota tools that gave hard-to-parse output. It would output a header line, followed by a number of lines of data. An example:
Filesystem blocks quota limit grace files ... /dev/hdd3 26320 46080 51200 1521 ... /dev/hdd4 26320 51200 1060 ...
My friend's problem was trying to deal with the "empty" columns in the data lines. When just using a naive split(), the empty columns would disappear, throwing off the count.
# controlled search-and-replace to insert column markers # using the header line as a guide to where the columns lie $headers = <>; $headers =~ s/(\S)\s/$1\#/g; while (<>) { s/\s/ (substr($headers, pos(), 1) eq '#')? '#' : ' ' /eg; @_ = split /\s*\#\s*/; # @_ now has true columns ready for whitespace trimming }

Replies are listed 'Best First'.
Re: parsing sloppy text from columns
by rob_au (Abbot) on May 08, 2003 at 03:29 UTC
    An alternate approach would be to make use of the unpack function ...

    my @line = unpack( 'A10A8A8A8A8A8', $line );

    Although I believe both approaches could be broken by over-field lengths or wrapping of text - But then again, there are no doubtedly better tools for that :-)

     

    perl -le 'print+unpack("N",pack("B32","00000000000000000000001001011000"))'

Re: parsing sloppy text from columns
by draconis (Scribe) on May 08, 2003 at 17:57 UTC
    Another simple yet effective way to eliminate the 'extra' whitespace would be to hit the line with something like this.

    @line=~s/ +/ /g;

    This should simply eliminate all duplicate whitespace chars and replace them with a single whitespace char so that you then can use a naive split().

      That will break when the columns are not consistently filled with values, f.ex if a value of 0 for some column results in nothing getting printed at all.

      Makeshifts last the longest.

        You are absolutely correct. My solution will ONLY work for a data set where one has confidence in the data and what is presented (ie. you know you have 6 columns and always get 6 columns).

        I appreciate you correcting this - my apologies.