in reply to regex trouble

This is a pretty straightforward thing to do. I'd break up the construction of $pattern in to steps:

[trwww@www misc]$ cat 618115.pl use warnings; use strict; my @delimiters = ( ',', '|', ':', '>', '][', '_|_', ); my $pattern = join '|', map quotemeta, @delimiters; my $text = "jojo,has|some:big>balls][nuts,sometimes_|_he,scratches"; foreach my $particle (split /$pattern/, $text) { print $particle."\n"; }

That gives the following output:

[trwww@www misc]$ perl 618115.pl jojo has some big balls nuts sometimes he scratches

Hope this helps,

trwww

Replies are listed 'Best First'.
Re^2: regex trouble
by dewey (Pilgrim) on May 30, 2007 at 16:43 UTC
    I just discovered quotemeta as a result of the recent functional functions node, so I'm pleased to see it in use here.

    I have a question about your solution, though. When I print "$pattern\n", I get
    \,|\||\:|\>|\]\[|_\|_
    To me, it's odd that this works correctly with the comma, colon, and chevron escaped. I'll go look at perlre, but why doesn't it matter that these are escaped?

    ~dewey
      In a regex, "\," is exactly equivalent to "," -- and likewise for colon and angle brackets. Those characters do not have any "magical" force in the regex syntax when used without escapes (in contrast to period, asterisk, square brackets and so on), nor do they have any special meaning when preceded by backslash (in contrast to "n", "t", "b", "d" and so on).

      Meanwhile, quotemeta is a more-or-less general-purpose function -- according to the manual, it 'Returns the value of EXPR with all non-"word" characters backslashed. (That is, all characters not matching "/[A-Za-z_0-9]/" will be preceded by a backslash in the returned string, regardless of any locale settings.)'

      (updated to fix display of square brackets in last paragraph)

        I see. Thanks.

        ~dewey