Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Greetings all,
I am attempting to write a parser to take a line of tokens and insert them into an array.
"What?!", I hear your cry... "Hasn't the man heard of split()?"
Naturally I have, and it served me well - until I've been requested (or sequestered, depending) to make the parser quote-aware.

Take the following string to be parsed:
Token1 Token2 "Little phrase" Token4

Which I need to be able to beat into:
@array = ("Token1", "Token2", "Little phrase", "Token4")

I have written something that somewhat works, but its a nasty kludge and I'm sure there's a talented regex hacker out there who can do this much tidier - and undoubtably faster than my semi-solution.

Thanks muchly, JP Hindin

Replies are listed 'Best First'.
Re: Parsing, tokens and strings
by merlyn (Sage) on Oct 17, 2001 at 20:11 UTC
      According to my understanding of the original poster's goal, this is NOT the solution to his problem. Contrary to the poster's statement that he needs to make split "quote aware", I think he really wants to make it quote UN-aware, so that he can treat quotes as another delimiter character (this is what the sample assignment statement he offered accomplishes, of the form @array = ( "A", "B" ) being the resulting effect, for an original line of the form A "B"). Unless I'm misunderstanding his intentions, the proper solution would then be something akin to: my @words = split /[\s"']+/ (assuming it's not important to ensure balanced use of quotes)
      Tim Maher tim@consultix-inc.com
        Oops! Now I see that the original poster's sample assignment was not as simple as I had shown, because he had "B C" where I had "B" (multiple quoted words being treated as a single token, vs. my idea of a single word). This being the case, my suggestion of changing split's delimiters will obviously not work, so "nevermind"! 8-}
        Tim Maher tim@consultix-inc.com
      Incidently, for solutions like this, is there an easy way of adding support for escaped quotes like \"?

      -Ted
      Ah,
      I'm impressed. The line was somewhat shorter than I had imagined....

      My thanks,
      JP

      -- Alexander Widdlemouse undid his bellybutton and his bum dropped off --

        Heh, I thought the quotes being left in the @array was a "feature", so I:
        foreach (@array) { s/"//g; s/'//g; }
        Thanks all for your help,

        JP Hindin,

        -- Alexander Widdlemouse undid his bellybutton and his bum dropped off --

      Yup, that'll be a talented regex hacker...
Re: Parsing, tokens and strings
by Fletch (Bishop) on Oct 17, 2001 at 21:59 UTC
    use Text::ParseWords qw( shellwords ); $line = q{Token1 Token2 "Little phrase" Token4}; @array = shellwords( $line ); print join( "\n", @array ), "\n";

    And that has the advantage of Text::ParseWords being core. If you don't mind fetching from CPAN, c.f. Text::Balanced and Regexp::Common.

    Update: Of course I just now noticed that merlyn had mentioned Text::ParseWords above. /me should read the whole thread more carefully.

Re: Parsing, tokens and strings
by petdance (Parson) on Oct 17, 2001 at 23:05 UTC
    You can also take a look at Text::CSV_XS, which has capabilities for setting what you want for field containers, field separators, and record terminators.

    xoxo,
    Andy
    --
    <megaphone> Throw down the gun and tiara and come out of the float! </megaphone>