tj_thompson has asked for the wisdom of the Perl Monks concerning the following question:
I've recently been doing a large amount of parsing and have wandered into a bit of a quandary related to regexes, \G, and /g.
My recent code has been some state based lexing/parsing approaches for files that range from relatively simple to moderately complex formats. I've been using an approach of slurping the file in and using a number of regex tokens along with \G and /gc to parse through the resulting string. I've run into an issue with /g that I'd like to get advice on.
Here's a simple example:
The output is:use strict; use warnings; my $data =<<'END'; x = 10; y = 12; z = 100; END sub parse { my $data_ref = shift; my $rgx = qr/(\w+)\s*=\s*(\d+)\s*;\s*/; my @m = $$data_ref =~ /\G$rgx/gc; return \@m; } while (my @matches = @{parse(\$data)}) { while (my $var = shift @matches) { my $val = shift @matches; print "I got variable ($var) set to value ($val)\n"; } print "Trying next parse...\n"; }
I would *like* the output to be:I got variable (x) set to value (10) I got variable (y) set to value (12) I got variable (z) set to value (100) Trying next parse...
Ideally, I'd like to be able to handle these declarations one at a time. Get x, get 10, handle storing the data, then return to my string for parsing. However, in data formatted in a regular repeating fashion, the /g modifier results in multiple matches.I got variable (x) set to value (10) Trying next parse... I got variable (y) set to value (12) Trying next parse... I got variable (z) set to value (100) Trying next parse...
/g seems to have two distinct functions: 1) ensure match position is retained after a match and 2) allow multiple matches to occur. I'd like to be able to retain the position of the match without the secondary effect of allowing multiple matches. We have the /gc modifier that allows retaining the match position after a failed match. My documentation reading suggests there is no similar modifier to only retain match position on a successful match outside of /g and its additional functionality. The pos function also only seems to work if /g is used.
So my question. How do I retain both single token matching capability and /g's position tracking in the string? Note I'm particularly trying to avoid cutting the string itself up as string manipulation greatly slows the parsing.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Perl regex \G /g and /gc
by ikegami (Patriarch) on Sep 17, 2014 at 02:23 UTC | |
by tj_thompson (Monk) on Sep 17, 2014 at 16:14 UTC | |
by uhClem (Scribe) on Aug 06, 2015 at 16:31 UTC | |
|
Re: Perl regex \G /g and /gc
by Anonymous Monk on Sep 17, 2014 at 00:12 UTC |