ckj has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

My string is something like this:

XYZ • Qno-483, sec-9a,ABS • P:(110) 53345345345 • F: (210) 123231231

and I want to extract the string before so that the output should be like this:

$1 = XYZ $2 = Qno-483, sec-9a,ABS $3 = (110) 53345345345 $4 = (210) 123231231

Replies are listed 'Best First'.
Re: Need assistance in Regular expression
by davido (Cardinal) on Jun 02, 2012 at 07:53 UTC

    I'm kind of interested in seeing what you come up with on your own after spending an hour with perlretut and perlre.

    Update: If you're using plain old (non-utf8) strings, the dot character you used has a different code point than if you are using utf8 strings. I'll assume you're dealing with a utf8 string, but you know your dataset better than I do, and can make your own determination in that regard. With that caveat, here's a regex solution for you. ...free of charge today, if you promise to read perlretut, perlrequick, perlre, and perlrecharclass before your next regular expression question. ;)

    use strict; use warnings; use utf8; use feature qw( unicode_strings ); binmode STDOUT, ':utf8'; my $string = 'XYZ • Qno-483, sec-9a,ABS • P:(110) 53345345345 • F: (210) 12323123 +1'; printf "Code point for '•': %X\n", ord('•'); if ( $string =~ m/ ^ ( [^\N{U+2022}]+ ) \s\N{U+2022} # Matches XYZ. \s ( [^\N{U+2022}]+ ) \s\N{U+2022} # Matches Qno-483, sec- +9a,ABS \sP:\s* ( [^\N{U+2022}]+ ) \s\N{U+2022} # Matches P:(110) 53345 +345345 \sF:\s* ( [^\N{U+2022}]+ ) \s*$ # Matches F: (210) 1232 +31231 /x ) { print "Match:\n\$1: [$1]\n\$2: [$2]\n\$3: [$3]\n\$4: [$4]\n"; } else { print "Your input string doesn't resemble the one posted to PerlMo +nks.\n"; }

    ...the output...

    Code point for '•': 2022 Match: $1: [XYZ] $2: [Qno-483, sec-9a,ABS] $3: [(110) 53345345345] $4: [(210) 123231231]

    Yes, you could have just pasted the 'dot' into your regular expression, or used an input device that provides the character, but then the next person to look at your code will have to go through contortions to figure out what that thing is too.


    Dave

Re: Need assistance in Regular expression
by choroba (Cardinal) on Jun 02, 2012 at 07:39 UTC
    Rather than regular expression match, I would use split:
    use utf8; my $str = 'XYZ • Qno-483, sec-9a,ABS • P:(110) 53345345345 • F: (210) +123231231'; my @array = split /\s*•(?:\s*.:)?\s*/, $str;
Re: Need assistance in Regular expression
by Anonymous Monk on Jun 02, 2012 at 07:17 UTC