Need assistance in Regular expression

ckj has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

My string is something like this:

XYZ • Qno-483, sec-9a,ABS • P:(110) 53345345345 • F: (210) 123231231
[download]

and I want to extract the string before •so that the output should be like this:

$1 = XYZ
$2 = Qno-483, sec-9a,ABS
$3 = (110) 53345345345
$4 = (210) 123231231
[download]

Comment on Need assistance in Regular expression Select or Download Code

Replies are listed 'Best First'.
Re: Need assistance in Regular expression by davido (Cardinal) on Jun 02, 2012 at 07:53 UTC
I'm kind of interested in seeing what you come up with on your own after spending an hour with perlretut and perlre. Update: If you're using plain old (non-utf8) strings, the dot character you used has a different code point than if you are using utf8 strings. I'll assume you're dealing with a `utf8` string, but you know your dataset better than I do, and can make your own determination in that regard. With that caveat, here's a regex solution for you. ...free of charge today, if you promise to read perlretut, perlrequick, perlre, and perlrecharclass before your next regular expression question. ;) use strict; use warnings; use utf8; use feature qw( unicode_strings ); binmode STDOUT, ':utf8'; my $string = 'XYZ • Qno-483, sec-9a,ABS • P:(110) 53345345345 • F: (210) 12323123 +1'; printf "Code point for '•': %X\n", ord('•'); if ( $string =~ m/ ^ ( [^\N{U+2022}]+ ) \s\N{U+2022} # Matches XYZ. \s ( [^\N{U+2022}]+ ) \s\N{U+2022} # Matches Qno-483, sec- +9a,ABS \sP:\s* ( [^\N{U+2022}]+ ) \s\N{U+2022} # Matches P:(110) 53345 +345345 \sF:\s* ( [^\N{U+2022}]+ ) \s*$ # Matches F: (210) 1232 +31231 /x ) { print "Match:\n\$1: [$1]\n\$2: [$2]\n\$3: [$3]\n\$4: [$4]\n"; } else { print "Your input string doesn't resemble the one posted to PerlMo +nks.\n"; } [download] ...the output... `Code point for '•': 2022 Match: $1: [XYZ] $2: [Qno-483, sec-9a,ABS] $3: [(110) 53345345345] $4: [(210) 123231231]` [download] Yes, you could have just pasted the 'dot' into your regular expression, or used an input device that provides the character, but then the next person to look at your code will have to go through contortions to figure out what that thing is too. Dave	[reply] [d/l] [select]
Re: Need assistance in Regular expression by choroba (Cardinal) on Jun 02, 2012 at 07:39 UTC
Rather than regular expression match, I would use split: `use utf8; my $str = 'XYZ • Qno-483, sec-9a,ABS • P:(110) 53345345345 • F: (210) +123231231'; my @array = split /\s•(?:\s.:)?\s*/, $str;` [download]	[reply] [d/l]
Re: Need assistance in Regular expression by Anonymous Monk on Jun 02, 2012 at 07:17 UTC
See How do I post a question effectively?, post some code	[reply]