in reply to Re: how to extract string by possible groupings?
in thread how to extract string by possible groupings?

It's possibly that a capture group was missed in your explanation:

/((.*\.c\s)|(.*\.h\s)|(.*\.cpp\s))|(\s+(.*)\%\s+(of+)\s+\d+\s)|(\bNone +­\b)/g #01 2 3 4 5 6 7

If you lay that out using the /x modifier it becomes more obvious:

/ ( # 1 (.*\.c\s) # 2 | (.*\.h\s) # 3 | (.*\.cpp\s) # 4 ) | (\s+ # 5 (.*)\%\s+ # 6 (of+)\s+\d+\s # 7 ) | (\bNone­\b) # 8 /gx

My preference would be to first reduce the capturing to just those parts that are needed. For example, it's unlikely that one would want both "1" and "2", "3", and "4". Likewise, it's unlikely that someone would care about "5" while also caring about "6", and "7".

Second, resort to named captures: (?<somename>...). And third, to look at breaking it up into smaller problems with /g and \G

I think, in particular, that named captures and (?:...) grouping where capturing isn't needed would make this easier to use.


Dave

Replies are listed 'Best First'.
Re^3: how to extract string by possible groupings?
by AnomalousMonk (Archbishop) on Jun 02, 2014 at 23:43 UTC
    ... named captures ...

    I think I would opt for a different course. Elaborating (well, second-guessing, really) on the example below, once you have validated a line , and given that the fields are completely mutually exclusive, the fields just pop out and go down as smoothly as oysters, with no capturing at all (update: no capturing to capture groups, that is).

    c:\@Work\Perl\monks>perl -wMstrict -le "use Regexp::Common; ;; my @lines = ( 'test1.cpp 0.00% of 21 0.00% of 16', 'test2.c None 16.53% of 484', 'test3.h 0.00% of 138 None', '/x/y/foo.c 0.00% of 1 None', ); ;; my $title = qr{ \w+ (?: [.] \w+)* }xms; my $percent = qr{ $RE{num}{real} % \s+ of \s+ \d+ }xms; my $none = qr{ None }xms; ;; for my $line (@lines) { print qq{line '$line'}; die qq{ BAD LINE: '$line'} unless $line =~ m{ \A $title (?: \s+ (?: $percent | $none)){2} \s* \z }xms; my ($t, $p1, $p2) = $line =~ m{ \A $title | $percent | $none }xmsg; print qq{ title: '$t' pcent1: '$p1' pcent2: '$p2'}; } " line 'test1.cpp 0.00% of 21 0.00% of 16' title: 'test1.cpp' pcent1: '0.00% of 21' pcent2: '0.00% of 16' line 'test2.c None 16.53% of 484' title: 'test2.c' pcent1: 'None' pcent2: '16.53% of 484' line 'test3.h 0.00% of 138 None' title: 'test3.h' pcent1: '0.00% of 138' pcent2: 'None' line '/x/y/foo.c 0.00% of 1 None' BAD LINE: '/x/y/foo.c 0.00% of 1 None' at -e line 1.

    Updates:

    1. Actually removed capturing groups from validation regex.
    2. It turns out the fields are not "completely mutually exclusive" as I originally claimed, so I had to change the extraction regex from
          m{ $title | $percent | $none }xmsg
      to
          m{ \A $title | $percent | $none }xmsg
      This somewhat vitiates the intended thrust of this post, but I think the main point stands. Oh, well...

Re^3: how to extract string by possible groupings?
by LanX (Saint) on Jun 02, 2014 at 23:26 UTC
    > It's possibly that a capture group was missed in your explanation:

    no, I started counting with 0 and you with 1.

    see Re^3: how to extract string by possible groupings? for why I did what I did! :)

    Cheers Rolf

    (addicted to the Perl Programming Language)

      Good call! :) As you probably guessed, I was considering \1, \2..., and their counterparts, $1, $2, etc.


      Dave