kfarr2 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Gurus –

I’m trying to parse a text file to find the ending column number in $_ for each grouping containing a “-“. The contents of $_ can widely vary. Example#1:

---------- ------------------------- -------- ---- ------------ +------------- ----- ---- ----------- -----------
should return column numbers 13, 40, 50, 56, 83, 90, 96, 109 and 122.

Example#2:

----------- ------
should return column numbers 14 and 22.

I’ve been reading documentation for the last two hours, so I hope that qualifies as a fair, if not good, try before posting this simple question.

Thanks

Replies are listed 'Best First'.
Re: text parsing question
by NetWallah (Canon) on Nov 07, 2004 at 14:26 UTC
    You may have to fix/fudge the LAST number, but this seems to be what you want:
    $_=' ---------- ------------------------- -------- --- - ------------------------- ----- ---- ----------- -----------'; while( m/-\S.\s|$/g){ print pos() -1,qq(\n) }
    OUTPUT:
    13 40 50 56 83 90 96 109 121

        Earth first! (We'll rob the other planets later)

Re: text parsing question
by Dietz (Curate) on Nov 07, 2004 at 14:52 UTC
    Another solution (though not so elegant as NetWallah's):
    $_ = ' ---------- ------------------------- -------- ---- ------ +------------------- ----- ---- ----------- -----------'; my $i; print $i += length $_, $/ for split /(?<=-)(?=\s+|$)/; __END__ __OUTPUT__ 13 40 50 56 83 90 96 109 122

Re: text parsing question
by davido (Cardinal) on Nov 07, 2004 at 17:01 UTC

    Here's a solution that uses a negative lookahead to find the ending boundaries of the '-' groupings.

    use strict; use warnings; my $string = " ---------- ------------------------- -------- ---- + ------------------------- ----- ---- ----------- -----------"; print pos($string), "\n" while $string =~ m/-(?!-)/g;

    I also toyed with m/-+/g, which works equally well, but must match the entire grouping of -'s before finishing each iteration. If the grouping is really long, that might be a little slower. The method I've provided just looks for boundaries, which seemed to be a good determination of the end of each grouping.

    Oh, also, in the previously posted solutions, people were subtracting 1 from pos(), which gives the column of the final hyphen of each grouping. But your proposed sample output seems to be asking for the column immediately following the end of each grouping. The solution I provided returns the column following the end of each grouping, to match your sample output. If you intended otherwise, add that "-1" to the code.


    Dave

Re: text parsing question
by tmoertel (Chaplain) on Nov 08, 2004 at 00:30 UTC
    Here's one way that is quite simple and conveniently groups the column endings for each line in an array:
    #!/usr/bin/perl use warnings; use strict; while (<DATA>) { my @columns = do { my $s; map $s += length, /(\s*-+)/mg } ; print "@columns\n"; } # Output: # # 13 40 50 56 83 90 96 109 122 # 14 22 __DATA__ ---------- ------------------------- -------- ---- -------- +----------------- ----- ---- ----------- ----------- ----------- ------

    Cheers,
    Tom

Re: text parsing question
by thetimeboy (Novice) on Nov 07, 2004 at 18:41 UTC
    Possibly the least imaginitive and most expensive solution I can think of is to loop. This is clunky... but it works!

    my $str = "- -- --- ---- -----";
    $str =~ s/^( +)//; # how many cliches does it take to
    my $cnt = length $1; # hit the ground running
    my $last1 = "-"; # for the sake of argument

    while ($str =~ s/(.)//)
    {
       push @ends, $cnt if ($last1 eq "-") && ($1 eq " ");
       $last1 = $1;
       $cnt++;
    }

    push @ends, $cnt if $last1 eq "-";
    print join (", ", @ends);

      If you are going to do it like you would in C.....

      use Inline 'C'; my $string = ' ----------- ------'; my @res = parse($string); print "@res\n"; __END__ __C__ void parse ( char * str ) { int i = 0; char cur, last = '\0'; dXSARGS; sp = mark; while ( cur = *(str++) ) { if ( last == '-' && (isspace(cur) || ! *str) ) XPUSHs(sv_2mortal(newSViv(i))); last = cur; i++; } PUTBACK; }

      cheers

      tachyon

Re: text parsing question
by TedPride (Priest) on Nov 08, 2004 at 04:54 UTC
    REGEX WITH WILDCARD DELIMITER
    41 seconds, 1,000,000 iterations
    my @nums; push @nums, pos() - 1 while m/-[^-]/g;
    REGEX WITH SPACE DELIMITER
    16 seconds, 1,000,000 iterations
    my @nums; push @nums, pos() - 1 while m/- /g;
    SUBSTR WITH SPACE DELIMITER
    13 seconds, 1,000,000 iterations
    my (@nums, $pos); push @nums, $pos while $pos = index($_, '- ', $pos) + 1;
    Note that all of the above have to have a space added to the end beforehand, and also removed afterwards if you intend to use the line for anything else:
    $_ = ' ---------- ------------------------- -------- ---- ------ +------------------- ----- ---- ----------- -----------'; $_ .= ' '; my (@nums, $pos); push @nums, $pos while $pos = index($_, '- ', $pos) + 1; print join(' ', @nums); chop;
Re: text parsing question
by kfarr2 (Novice) on Nov 07, 2004 at 21:02 UTC
    I am humbled - thank you all.