in reply to split on spaces, except those within quotes?

I probably deserve hate mail for this one but...

#! perl -sw use strict; sub tokenize ($) { local $_ = shift; s/(('|").*?\2)/ ($£ = $1) =~ s!\s+!\cA!g; $£ /ge; #!" grep{s/\cA/ /g, $_}split/\s+/; } my @bits = tokenize q/a "b c d" e f 'g h' ijk "l m n " op 'q r s +t' u'v w'x yz/; local $,='|'; print @bits,$/; __END__ c:\test>212174 a|"b c d"|e|f|'g h'|ijk|"l m n "|op|'q r s t'|u'v w'x|yz|

Nah! You're thinking of Simon Templar, originally played (on UKTV) by Roger Moore and later by Ian Ogilvy

Replies are listed 'Best First'.
Re: Re: split on spaces, except those within quotes?
by diotalevi (Canon) on Nov 12, 2002 at 03:36 UTC

    Well I'm astonished that worked. The last (and first) time I tried to do a re-entrant regex I triggerred some sort of malloc error. I just thought that doing regexes while inside a regex was disallowed or something. Odd.

    __SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;

      I assume you're referring to this:

      s/(('|").*?\2)/ ($£ = $1) =~ s!\s+!\cA!g; $£ /ge; #!"

      That isn't re-entrant. The right side of a substitution counts as a string (which in this case, is eval'ed because of the /e,); only the right side counts as a regex.

        I'm being imprecise. I wasn't sure if it's just the difference between a self-modifying regex, a regex that modifies source data (and then calls itself), or regexes that just include other regexes. On further consideration I'm sure that it makes a difference I was using the (??{})/(?{}) constructs. Perhaps it's that sort of use that got me into hot water.

        Update: it was just some toy code that did something about defining a qr// regex in terms of a source string, modified the source string and then had the qr// regex call itself again with the new data.

        Update again: My misunderstanding was on whether more than one instance of the regex engine could be active at once. Perhaps it was the regex compilation that killed it. Anyhow I'll just do a bug report now that I know it's not a Just-Don't-Do-That sort of thing.

        __SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;
Re^2: split on spaces, except those within quotes?
by SwaJime (Scribe) on Feb 04, 2021 at 13:26 UTC
    This looks way cool. However, it is not coming through in the web browser as something usable. It has odd characters, A with symbols over top in several places, and what looks like a currency symbol in a couple of places. Does anybody have access to the original correct formula?

      I believe that line is supposed to be s/(('|").*?\2)/ ($£ = $1) =~ s!\s+!\cA!g; $£ /ge;. AFAICT, the nonstandard is just supposed to be a scratch variable, so you can replace it with e.g. $a (assuming there's no sort in the call stack) or a lexical of your choosing.

      However, note BrowserUk's words: "I probably deserve hate mail for this one but..." - see e.g. Regexp::Common::delimited or Text::Balanced.

Re: Re: split on spaces, except those within quotes?
by John M. Dlugosz (Monsignor) on Nov 12, 2002 at 17:09 UTC
    Why did you use grep instead of map?

    —John

      Good point John. map works just as well and is a better fit. Thanks.


      Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
      Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
      Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
      Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.