the_0ne has asked for the wisdom of the Perl Monks concerning the following question:

Good evening my fellow perlmonks. I have a question on repeating a regex. Now let me explain to the people I already have thinking, what's this guy talking about. I have a string with a bunch of double-quoted phrases. I'd like to have one regex pull all the sets of double-quoted phrases out, but not sure how to get past the first one...

$myString = qq(stay "keep together" apart "and this" not);

I want to be able to pull the "keep together" and "and this" with one regex. Is that possible? I have this simple regex to pull the first...

$myString =~ /\"(.+?)\"/g;

That only gives me the "keep together" (without the quotes). Is there a way to do this with one regex or should it be tackled some other way?

Thanks.

Replies are listed 'Best First'.
Re: Repeatable regex.
by Masem (Monsignor) on Apr 03, 2001 at 05:53 UTC
    You're close -- the regex will return a list context with each matched value, particularly if you use /g. So do:
    $mystring = join(' ', ( $myString =~ /\"(.+?)\"/g) );
    And that should be able to do it. (Update...after fixing that first operator to be '=', not '=~'....)
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
      Masem:

      Thanks for the help. I couldn't get it to work the exact way you had it, but I'm thinking that the first "=~" should have been "=". Is this right or did I do something wrong by not getting it to work the first way you had it?

      Thanks again, that worked perfectly though.

      If you did add the "~" by accident I can understand that. A lot of times I'll write up a beautiful (if I do say so myself) regex and then wonder why it's not working. I bang my head and bang my head and then I realize I forgot the (~).

      Update:
      Thanks Masem, I noticed your update.
(bbfu) (dot star) Re: Repeatable regex.
by bbfu (Curate) on Apr 03, 2001 at 06:04 UTC

    Please stop using .+?, everyone. Please. Use "[^"]+" instead. Read Death to Dot Star! to find out why it's bad.

    bbfu
    Seasons don't fear The Reaper.
    Nor do the wind, the sun, and the rain.
    We can be like they are.

      Ok, I changed it to this...

      myString = join ('~', ( $myString =~ /"([^"]+)"/g ) );

      Sorry, didn't know about the "Death to Dot Star", but the above modification does work.

      Thanks.
        Please note that I am at an Internet café right now and thus cannot test anything that I am writing, so be gentle with me :)

        The regex you listed is better, but you should be aware that if you're working with data that someone else supplies, you may have to deal with escaped quotes. The regex /"([^"]+)"/g will probably not behave as you expect with the following:

        my $string = qw!"This is \"data\""!;
        So, we try the following:
        $string =~ /"((?:\\"|[^"])*)"/;
        Break that out:
        $string =~ /" # first quote ( # capture to $1 (?: # non-capturing parens \\" # an escaped quote | # or [^"] # a non-quote )* # end grouping (zero or more of above) ) # end capture "/x; # last quote
        Looks good. We allow for escaped quotes, but what if the string is something like "test\". That's poorly formed, so we'll probably also have to allow escaped escapes (sigh). That means a string like "test\\". The following should be pretty close to what you want:
        $string =~ /"((?:\\["\\]|[^"])*)"/;
        It's really ugly, but should be closer to what you are might need. However, regular expressions such as these can get quite hairy. I understand that Text::Balanced is perfect for issues like this, but I have never used it.

        Cheers,
        Ovid

        Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: Repeatable regex.
by MeowChow (Vicar) on Apr 03, 2001 at 05:57 UTC
    Yes, just do as you state, and repeat the regex:
    while ($myString =~ /"(.+?)"/g) { print $1, "\n"; }
    The /g switch in scalar context causes the regex to remember its last match position. Also note that escaping the quote is unnecessary. See perlre and perlop for more details.
Re: Repeatable regex.
by Xxaxx (Monk) on Apr 03, 2001 at 10:51 UTC
    I recall seeing some code that would "catch" all of the matches in an array. I don't recall the exact code. I've been playing around with different syntax. The example below seems to work okay.

    my(@foundStrings) = ($myString =~ /"([^"]+)"/g);
    In full snippet:
    #!/usr/local/bin/perl use strict; my $myString ='stay "keep together" apart "and this" not'; print "\n$myString\n"; my (@foundStrings) = ($myString =~ /"([^"]+)"/g); foreach my $string (@foundStrings) { print "$string\n"; } exit;
    By the way, if you'd prefer to have the " marks included in the resulting array I find this works.
    my(@foundStrings) = ($myString =~ /"[^"]+"/g);
    Funny how much power a couple of parens can have. Behold the power of Perl and be humbled. ;-)

    Claude