Regex - unordered lookaround syntax

shemp has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
•Re: Regex - unordered lookaround syntax by merlyn (Sage) on Apr 28, 2003 at 22:57 UTC
Sounds to me like you just want this, and have overcomplicated it tremendously: `my $string = "A BC 1 23DEF45 6"; my @parts = $string =~ /([a-zA-Z]+\|\d+)/g;` [download] Split is wrong when it's easier to talk about what you want to keep rather than what you want to throw away. For that, use a m//g in a list context. It looks like what you wanted to keep was any run of digits, or any run of letters. Hence, mine. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l]
Re: •Re: Regex - unordered lookaround syntax by halley (Prior) on Apr 29, 2003 at 00:28 UTC
"Split is wrong when it's easier to talk about what you want to keep rather than what you want to throw away." --merlyn I had always understood when I needed split, and when I needed match. But my brain kept these two concepts completely separate for a while. Then I had one of those eureka moments when something explained the magical syntax `split //` to split the string into solo characters. What a weird special-case, I had thought before. Now it seems so logical. It makes sense that if `s//-/g` would insert dashes between each character, and `m//g` would happily return an array of nothings for each character, that `split //` should return the array of each character between all those nothings. -- `[ e d @ h a l l e y . c c ]`	[reply] [d/l] [select]
Re: Re: •Re: Regex - unordered lookaround syntax by Anonymous Monk on Apr 29, 2003 at 04:56 UTC
But have you ever stopped to wonder why you don't get an infinite loop at the first empty space? (This is explained in perlre, but virtually nobody understands the explanation.)	[reply]
Re: Re: Re: •Re: Regex - unordered lookaround syntax by halley (Prior) on Apr 29, 2003 at 15:31 UTC
Re: •Re: Regex - unordered lookaround syntax by shemp (Deacon) on Apr 28, 2003 at 23:04 UTC
I think you are correct, i need to leave right now, but that looks much better. You always help me with what i consider to be pretty bizarre problems. Thanks much. BTW: screw intel	[reply]
Re: Regex - unordered lookaround syntax by runrig (Abbot) on Apr 28, 2003 at 23:27 UTC
This loses on your golf requirement, but its better on being more correct in locale-specific environments: `my $str = " abc def123abc def"; print "[$_]\n" for split / \s+\| (?<=[[:alpha:]])(?=[[:digit:]])\| (?<=[[:digit:]])(?=[[:alpha:]]) /ix, $str;` [download] I think doing what you ask is somewhat doable, but it would just be more of a mess. And turning the problem inside-out as merlyn suggests is a much better answer anyway :-)	[reply] [d/l]
Re: Regex - unordered lookaround syntax by The Mad Hatter (Priest) on Apr 28, 2003 at 22:55 UTC
I am nowhere near being a regex guru and therefore can't answer your question, but I think I've read somewhere that using the /i modifier slows things down considerably. Apparently, using `[A-Za-z]` is faster than `[A-Z]` with the /i modifier. Just wanted to point this out in case the code is being run where performance really matters.	[reply] [d/l] [select]
Re: Re: Regex - unordered lookaround syntax by diotalevi (Canon) on Apr 28, 2003 at 23:20 UTC
Nah, take a look at that via `use re 'debug'`. You'll see that `/(?i:[A-Z])/` is `/[A-Z]/i` is `/[A-Za-z]/`. All three interpret identically. If you run the actual example through it you'll see its the same. I didn't know the answer to this prior to running these through re'debug' so I'm suggesting that great debug tools like this should be used more often especially when making assertions regarding relative performance. In this case all you win is some source code obfuscation since I hold that its easier to look at either `(?i:[A-Z])` (which is really nice because it restricts the effects of /i to just that section or a tacked on /i. Having to be extra specific just makes it easier to type another bug.	[reply] [d/l] [select]
Re: Re: Re: Regex - unordered lookaround syntax by Your Mother (Archbishop) on Apr 29, 2003 at 00:29 UTC
I think the assertion came from Jeff Friedl in this case (Mastering Regular Expressions, 1st ed, not sure about second). He says that the i modifier can be up to 20 times slower than a case specific match, if memory serves. I believe this was during Perl 5.4 though and right before a major regex overhaul. Nice to know it's no longer true.	[reply]
Re: Regex - unordered lookaround syntax by aquarium (Curate) on Apr 29, 2003 at 01:48 UTC
i think this may be one of those times when a traversal of the string would be better than a regex...a lot simpler than that regex, and you can code some other cases to split on as well, without getting a headache from the regex. Chris	[reply]