Yet another regex question

gregor-e has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Yet another regex question by ikegami (Patriarch) on Aug 25, 2005 at 17:41 UTC
Don't use character class, use alternations and anchoring in case the string is longer than one character: `=~ m/^(?:7\|8\|9)$/`. Don't forget to use quotemeta on the strings. `sub dequote { local $_ = (@_ ? $_[0] : $_); s/^'//; s/'$//; s/\\(.)/$1/s; return $_; } my $list = join '\|', map quotemeta, map dequote, @parts; print("=~ m/^(?:$list)\$/");` [download] How do you extract the parts? Well, that's quite hard to do right using regexp. You should write a parser.	[reply] [d/l] [select]
Re: Yet another regex question by philcrow (Priest) on Aug 25, 2005 at 17:12 UTC
If it's just the quotes stumping you try Text::Balanced. Phil	[reply]
Re: Yet another regex question by injunjoel (Priest) on Aug 25, 2005 at 17:10 UTC
Greetings all, Update a better way to do it.(than I had previously posted) Though only tested on a simple case the following should get you your global replace you were looking for. `$str =~ s/if substr$([^$])\) in ($[^$]\))/my($f,$s)=($1,$2); $f= +~s!^!\$!; $s=~s!\D!!g; "if substr($f) =~ m#[$s]#"/eg;` [download] Read more... (647 Bytes) -InjunJoel "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo	[reply] [d/l] [select]
Re: Yet another regex question by Fletch (Bishop) on Aug 25, 2005 at 17:03 UTC
Sounds like you need a proper parser. This is similar to why you need more than just regexen to parse (arbitrary) HTML reliably. And just a thought, you might find it easier to implement `in` with `grep` rather than a regex. -- We're looking for people in ATL	[reply] [d/l]
Re: Yet another regex question by davidrw (Prior) on Aug 25, 2005 at 17:08 UTC
can you give some examples of the before and after's of your data? You say "global substitute" but aren't clear on what you're transforming the data into ... if you are just matching, it looks like all you need is a capture in your regex: `if (substr($acct_trtmt_hsty,1,1) =~ m/([789])/) { print "the number is: " . $1; }` [download] Are you sure the LHS should be the substr and not $acct_trtmt_hsty ? You may also want a strict regex: `my @numbers = $acct_trtmt_hsty =~ m/'([789])'/g; print join ":", @numbers;` [download] Substitution example (subtracts 4 from every number): `$acct_trtmt_hsty, =~ s/(')([789])(')/$1 . ($2-4) . $3/eg; # note: ma +ny different ways to write this regex` [download]	[reply] [d/l] [select]
Re: Yet another regex question by gregor-e (Beadle) on Aug 25, 2005 at 19:44 UTC
So I hear two votes for using a proper parser and several piecewise suggestions that, unfortunately, don't lend themselves to transforming the entire file in one substitute. I was hoping there might be some trick like `$programSource =~ s/in $('([^'])'[,])$/=~ m#[$2$3$4$5$6$7$8$9]#;/g;` where perhaps the outer group of parentheses would be $1 and any number of nested inner parentheses groups would be matched for all patterns matching `([^'])` inside the single-quotes. In a more broken-out fashion: `$programSource =~ s/in $ (' <-- outer group start goes in $1 ([^']) <-- inner group chars between '' '[,])* <-- followed by optional comma, close outer group + and repeat outer group for any number of repetitions $/=~ m#[$2$3$4$5$6$7$8$9]#;/g; <-- put inner grou +ps in match (up to 8 of them, anyway)` [download] Since I'm trying to use global substitutes that can span several lines and update the entire source file in one substitutution, and since there are only three cases in the source I am currently trying to transform, I'll probably just use three separate global substitutes, one for each of the `in ('X')`, `in ('X','Y')` and `in ('X','Y','Z')` cases. It just seems like it ought to be something one could do with a single fancy global substitute. Thanks for your thoughts.	[reply] [d/l] [select]
Re^2: Yet another regex question by QM (Parson) on Aug 26, 2005 at 22:07 UTC
The problem is one of picking out pieces. A "normal" s/// won't work, but a cheeky `(?{code block})` does: #!/your/perl/here use strict; use warnings; my $one = q/in ('X')/; my $two = q/in ('X','Y')/; my $three = q/in ('X','Y','Z')/; foreach my $s ( $one, $two, $three ) { our $r=''; (my $u = $s) =~ s{ # open regex - curly delimiters in\s+ $ # literal open paren (?: # non-capturing group ' # literal single quote ( # capture group [^'] # single non-single-quote-char ) # close capture group (?{$r=$r.$^N}) ' # literal single quote ) # closing non-capture group (?: # opening non-capture group ,' # literals ( # capture group [^'] # single non-single-quote-char ) # close capture group (?{$r=$r.$^N}) ' # literal single-quote )* # close non-capture group $ # literal close paren } # close regex - curly delimiter {=~\ m/\[$r\]}x; print "$s\n$u\n\n"; } [download] But that's not usually a good idea. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l] [select]