text parsing question

kfarr2 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: text parsing question by NetWallah (Canon) on Nov 07, 2004 at 14:26 UTC
You may have to fix/fudge the LAST number, but this seems to be what you want: `$_=' ---------- ------------------------- -------- --- - ------------------------- ----- ---- ----------- -----------'; while( m/-\S.\s\|$/g){ print pos() -1,qq(\n) }` [download] OUTPUT: `13 40 50 56 83 90 96 109 121` [download] Earth first! (We'll rob the other planets later)	[reply] [d/l] [select]
Re: text parsing question by Dietz (Curate) on Nov 07, 2004 at 14:52 UTC
Another solution (though not so elegant as NetWallah's): `$_ = ' ---------- ------------------------- -------- ---- ------ +------------------- ----- ---- ----------- -----------'; my $i; print $i += length $_, $/ for split /(?<=-)(?=\s+\|$)/; __END__ __OUTPUT__ 13 40 50 56 83 90 96 109 122` [download]	[reply] [d/l]
Re: text parsing question by davido (Cardinal) on Nov 07, 2004 at 17:01 UTC
Here's a solution that uses a negative lookahead to find the ending boundaries of the '-' groupings. `use strict; use warnings; my $string = " ---------- ------------------------- -------- ---- + ------------------------- ----- ---- ----------- -----------"; print pos($string), "\n" while $string =~ m/-(?!-)/g;` [download] I also toyed with `m/-+/g`, which works equally well, but must match the entire grouping of -'s before finishing each iteration. If the grouping is really long, that might be a little slower. The method I've provided just looks for boundaries, which seemed to be a good determination of the end of each grouping. Oh, also, in the previously posted solutions, people were subtracting 1 from `pos()`, which gives the column of the final hyphen of each grouping. But your proposed sample output seems to be asking for the column immediately following the end of each grouping. The solution I provided returns the column following the end of each grouping, to match your sample output. If you intended otherwise, add that "-1" to the code. Dave	[reply] [d/l] [select]
Re: text parsing question by tmoertel (Chaplain) on Nov 08, 2004 at 00:30 UTC
Here's one way that is quite simple and conveniently groups the column endings for each line in an array: `#!/usr/bin/perl use warnings; use strict; while (<DATA>) { my @columns = do { my $s; map $s += length, /(\s-+)/mg } ; print "@columns\n"; } # Output: # # 13 40 50 56 83 90 96 109 122 # 14 22 __DATA__ ---------- ------------------------- -------- ---- -------- +----------------- ----- ---- ----------- ----------- ----------- ------` [download] Cheers, Tom Tom Moertel* : Blog / Talks / CPAN / LectroTest / PXSL / Coffee / Movie Rating Decoder	[reply] [d/l]
Re: text parsing question by thetimeboy (Novice) on Nov 07, 2004 at 18:41 UTC
Possibly the least imaginitive and most expensive solution I can think of is to loop. This is clunky... but it works! my $str = "- -- --- ---- -----"; $str =~ s/^( +)//; # how many cliches does it take to my $cnt = length $1; # hit the ground running my $last1 = "-"; # for the sake of argument while ($str =~ s/(.)//) { push @ends, $cnt if ($last1 eq "-") && ($1 eq " "); $last1 = $1; $cnt++; } push @ends, $cnt if $last1 eq "-"; print join (", ", @ends);	[reply]
Re^2: text parsing question by tachyon (Chancellor) on Nov 07, 2004 at 23:04 UTC
If you are going to do it like you would in C..... `use Inline 'C'; my $string = ' ----------- ------'; my @res = parse($string); print "@res\n"; __END__ __C__ void parse ( char * str ) { int i = 0; char cur, last = '\0'; dXSARGS; sp = mark; while ( cur = (str++) ) { if ( last == '-' && (isspace(cur) \|\| ! str) ) XPUSHs(sv_2mortal(newSViv(i))); last = cur; i++; } PUTBACK; }` [download] cheers tachyon	[reply] [d/l]
Re: text parsing question by TedPride (Priest) on Nov 08, 2004 at 04:54 UTC
REGEX WITH WILDCARD DELIMITER 41 seconds, 1,000,000 iterations `my @nums; push @nums, pos() - 1 while m/-[^-]/g;` [download] REGEX WITH SPACE DELIMITER 16 seconds, 1,000,000 iterations `my @nums; push @nums, pos() - 1 while m/- /g;` [download] SUBSTR WITH SPACE DELIMITER 13 seconds, 1,000,000 iterations `my (@nums, $pos); push @nums, $pos while $pos = index($_, '- ', $pos) + 1;` [download] Note that all of the above have to have a space added to the end beforehand, and also removed afterwards if you intend to use the line for anything else: `$_ = ' ---------- ------------------------- -------- ---- ------ +------------------- ----- ---- ----------- -----------'; $_ .= ' '; my (@nums, $pos); push @nums, $pos while $pos = index($_, '- ', $pos) + 1; print join(' ', @nums); chop;` [download]	[reply] [d/l] [select]
Re: text parsing question by kfarr2 (Novice) on Nov 07, 2004 at 21:02 UTC
I am humbled - thank you all.	[reply]