Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Yet another regex question

by gregor-e (Beadle)
on Aug 25, 2005 at 16:49 UTC ( [id://486629]=perlquestion: print w/replies, xml ) Need Help??

gregor-e has asked for the wisdom of the Perl Monks concerning the following question:

I need to translate SAS into Perl. So I have a script that slurps the entire SAS file into a scalar and goes about global substitutions in the scalar to hammer the C-like SAS into bread-and-butter Perl. But when it came to defining a global substitute to transform
if substr(acct_trtmt_hsty,1,1) in ('7','8','9')
into
 if (substr($acct_trtmt_hsty,1,1) =~ m/[789]/)
I came up stumped. Okay, I really only have a problem transforming
 in ('7','8','9')
into
 =~ m/[789]/
because I don't see how to extract what's inside each 'X' within the parentheses (there are varying numbers of them).

Is a global substitute like this possible?

Replies are listed 'Best First'.
Re: Yet another regex question
by ikegami (Patriarch) on Aug 25, 2005 at 17:41 UTC

    Don't use character class, use alternations and anchoring in case the string is longer than one character: =~ m/^(?:7|8|9)$/. Don't forget to use quotemeta on the strings.

    sub dequote { local $_ = (@_ ? $_[0] : $_); s/^'//; s/'$//; s/\\(.)/$1/s; return $_; } my $list = join '|', map quotemeta, map dequote, @parts; print("=~ m/^(?:$list)\$/");

    How do you extract the parts? Well, that's quite hard to do right using regexp. You should write a parser.

Re: Yet another regex question
by philcrow (Priest) on Aug 25, 2005 at 17:12 UTC
Re: Yet another regex question
by injunjoel (Priest) on Aug 25, 2005 at 17:10 UTC
    Greetings all,
    Update a better way to do it.(than I had previously posted)
    Though only tested on a simple case the following should get you your global replace you were looking for.
    $str =~ s/if substr\(([^\)]*)\) in (\([^\)]*\))/my($f,$s)=($1,$2); $f= +~s!^!\$!; $s=~s!\D!!g; "if substr($f) =~ m#[$s]#"/eg;


    -InjunJoel
    "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo
Re: Yet another regex question
by Fletch (Bishop) on Aug 25, 2005 at 17:03 UTC

    Sounds like you need a proper parser. This is similar to why you need more than just regexen to parse (arbitrary) HTML reliably.

    And just a thought, you might find it easier to implement in with grep rather than a regex.

    --
    We're looking for people in ATL

Re: Yet another regex question
by davidrw (Prior) on Aug 25, 2005 at 17:08 UTC
    can you give some examples of the before and after's of your data? You say "global substitute" but aren't clear on what you're transforming the data into ...

    if you are just matching, it looks like all you need is a capture in your regex:
    if (substr($acct_trtmt_hsty,1,1) =~ m/([789])/) { print "the number is: " . $1; }
    Are you sure the LHS should be the substr and not $acct_trtmt_hsty ? You may also want a strict regex:
    my @numbers = $acct_trtmt_hsty =~ m/'([789])'/g; print join ":", @numbers;
    Substitution example (subtracts 4 from every number):
    $acct_trtmt_hsty, =~ s/(')([789])(')/$1 . ($2-4) . $3/eg; # note: ma +ny different ways to write this regex
Re: Yet another regex question
by gregor-e (Beadle) on Aug 25, 2005 at 19:44 UTC
    So I hear two votes for using a proper parser and several piecewise suggestions that, unfortunately, don't lend themselves to transforming the entire file in one substitute.

    I was hoping there might be some trick like
     $programSource =~ s/in \(('([^']*)'[,])*\)/=~ m#[$2$3$4$5$6$7$8$9]#;/g;
    where perhaps the outer group of parentheses would be $1 and any number of nested inner parentheses groups would be matched for all patterns matching ([^']*) inside the single-quotes. In a more broken-out fashion:

    $programSource =~ s/in \( (' <-- outer group start goes in $1 ([^']*) <-- inner group chars between '' '[,])* <-- followed by optional comma, close outer group + and repeat outer group for any number of repetitions \)/=~ m#[$2$3$4$5$6$7$8$9]#;/g; <-- put inner grou +ps in match (up to 8 of them, anyway)

    Since I'm trying to use global substitutes that can span several lines and update the entire source file in one substitutution, and since there are only three cases in the source I am currently trying to transform, I'll probably just use three separate global substitutes, one for each of the in ('X'), in ('X','Y') and in ('X','Y','Z') cases. It just seems like it ought to be something one could do with a single fancy global substitute.

    Thanks for your thoughts.

      The problem is one of picking out pieces. A "normal" s/// won't work, but a cheeky (?{code block}) does:
      #!/your/perl/here use strict; use warnings; my $one = q/in ('X')/; my $two = q/in ('X','Y')/; my $three = q/in ('X','Y','Z')/; foreach my $s ( $one, $two, $three ) { our $r=''; (my $u = $s) =~ s{ # open regex - curly delimiters in\s+ \( # literal open paren (?: # non-capturing group ' # literal single quote ( # capture group [^'] # single non-single-quote-char ) # close capture group (?{$r=$r.$^N}) ' # literal single quote ) # closing non-capture group (?: # opening non-capture group ,' # literals ( # capture group [^'] # single non-single-quote-char ) # close capture group (?{$r=$r.$^N}) ' # literal single-quote )* # close non-capture group \) # literal close paren } # close regex - curly delimiter {=~\ m/\[$r\]}x; print "$s\n$u\n\n"; }
      But that's not usually a good idea.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://486629]
Approved by BaldPenguin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2024-04-18 17:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found