in reply to Parsing a line of text items

A Text::CSV (or Text::CSV_XS for speed) solution seems very appropriate, but if you need to roll your own, maybe something like:

Win8 Strawberry 5.30.3.1 (64) Tue 03/30/2021 11:53:39 C:\@Work\Perl\monks >perl -Mstrict -Mwarnings use 5.010; # needs (?|...) branch reset my $rx_dq_body = qr{ [^\\"]* (?: \\. [^\\"]* )* }xms; my $rx_unquoted = qr{ \S+ }xms; for my $args ( '', ' ', '23 45.67 "John Marcus O\"Ddly" Surname', '"only \"quoted\" thing"', 'no quoted stuff', ) { my $got_parsed_args = my @parsed_args = $args =~ m{ \G \s* (?| " ($rx_dq_body) " | ($rx_unquoted)) }xmsg; print ">$args< -> "; if ($got_parsed_args) { printf "%s \n", join ' ', map ">$_<", @parsed_args; } else { print "nada \n"; } } ^Z >< -> nada > < -> nada >23 45.67 "John Marcus O\"Ddly" Surname< -> >23< >45.67< >John Marcus +O\"Ddly< >Surname< >"only \"quoted\" thing"< -> >only \"quoted\" thing< >no quoted stuff< -> >no< >quoted< >stuff<

This needs Perl version 5.10+ for the (?|...) "branch reset" operator, but modification for pre-5.10 Perls is simple; let me know if you need it. The $rx_dq_body regex to match a double-quoted body supports embedded escaped double-quotes (and any other escaped character). You can play with this regex to get exactly what you want/need.

Of course, lots of tests should be done to verify this (or any other solution) really does what you want.

Update: For some reason, I included a \G \s* group in the regex above. It is entirely unnecessary although it does no harm AFAICT. The match regex
    m{ (?| " ($rx_dq_body) " | ($rx_unquoted)) }xmsg
should be exactly equivalent.


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: Parsing a line of text items
by LanX (Saint) on Mar 30, 2021 at 18:09 UTC
    I can understand the challenge to hack it by yourself ... :)

    But I think the suggested Text::ParseWords is core and offers everything I expect from parsing a command line.

    It has also tests, is cutomizable and the source is well structured and documented.

    So if I "wanna roll my own" and need to make special adjustments (like e.g. paired {quotes} ) I can take the code as a base.

    DB<94> use Text::ParseWords qw/shellwords/ DB<96> x shellwords(q{this is 'an example' "with different quoting a +nd \" escaping" including\ escaped\ whitespace}) 0 'this' 1 'is' 2 'an example' 3 'with different quoting and " escaping' 4 'including escaped whitespace' DB<97>

    In case larger files need to be parsed I'll consider a dependency to Text::CSV , but this really looks good.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      I would tend to agree that an approach using a reliable, common module like Text::ParseWords (of which I had not previously been aware -- thanks, philipbailey++) or Text::CSV is usually best. But I wanted to give an example of a "pure" regex approach.

      As an aside, I think it's worth emphasizing again that whatever approach is taken, a thorough suite of tests for the final code is advisable even if the approach is based on well-tested modules.


      Give a man a fish:  <%-{-{-{-<

        This might sound strange, but these are human interfaces which are hard to test from some level of complexity on.

        I'm sure there are plenty of incompatible solutions, but nobody would notice they are different.

        For instance, what about single quotes without closing partner? Or quotes without neighboring whitespace?

        Do users even expect them to be parsed in a meaningful way?

        The behavior may differ between different solutions, while all are conceived correct.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery