Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Seems like a simple problem but I cannot seem to get it working. I have multiple sets of data contained within one string I need to get each set represented as a value in an array. The data sets in the string are variable but follow the format below:
data_element::item::value|item::value|item::value
This pattern could appear many time in the string, a multiple element string would appear as:
date0_element::item::value|item::value|item::value|data1_element::item +::value|item::value|item::value
The only constant is the placement of the "::" and the "|". Below is some code that I've tested with (without success):
#!/usr/bin/perl $_="monsa::clear::1|red::23|blue::50|monsb::clear::80|red::90|blue::10 +0|"; @instances = m/(\S+::\S+::\S+\|\S+::\S+\|\S+::\S+\b)/g; foreach $inst (@instances) { print "found $inst\n"; } exit;
What I would expect (and desire) would be seperate values for each data set instance as below:
monsa::clear::1|red::23|blue::50 monsb::clear::80|red::90|blue::100
But it seems to throw everything into one array value. Any thoughts?

Replies are listed 'Best First'.
Re: Regular Expression Trick
by sgifford (Prior) on Nov 24, 2003 at 06:07 UTC

    You're really close. The problem is that \S+ matches anything besides whitespace, including colons and pipes. By default, + is greedy, so it matches as much as it can. So the first \S+ matches monsa::clear::1|red::23|blue::50|monsb::---quite a bit more than you want it to!

    The two solutions are to not match colons or pipes with your regex, by using [^:|]+ instead of \S+, or to tell the regex to match non-greedily, by saying \S+? instead of \S+. I've done the latter here, and it seems to give the output that you want:

    #!/usr/bin/perl $_="monsa::clear::1|red::23|blue::50|monsb::clear::80|red::90|blue::10 +0|"; @instances = m/(\S+?::\S+?::\S+?\|\S+?::\S+?\|\S+?::\S+?\b)/g; foreach $inst (@instances) { print "found $inst\n"; } exit;
Re: Regular Expression Trick
by Roger (Parson) on Nov 24, 2003 at 06:14 UTC
    How about this attempt - do not hardcode your regular expression, use a 'split' on record boundary instead.
    #!/usr/local/bin/perl -w use strict; chomp(my @lines = <DATA>); my @array; foreach (@lines) { foreach (split /(?<!\w)(?=\w+::\w+::\w+)/) { # <- split on record bo +undary s/\|$//; # get rid of the last | on the line push @array, $_; } } print "$_\n" for @array; __DATA__ monsa::clear::1|red::23|blue::50|monsb::clear::80|red::90|blue::100| monsc::clear::1|red::23|blue::50|green::50|monsd::clear::80|red::90|
    And the output is -
    monsa::clear::1|red::23|blue::50 monsb::clear::80|red::90|blue::100 monsc::clear::1|red::23|blue::50|green::50 monsd::clear::80|red::90
Re: Regular Expression Trick (Don't use one!)
by dragonchild (Archbishop) on Nov 24, 2003 at 13:29 UTC
    Why use a regex?!?
    foreach my $line (@lines) { my ($name, $values) = split '::', $line, 2; # <-- This is the key +- limit your split my @values = split /\|/, $values; # <-- This is not limite +d print "$name\n"; foreach my $value (@values) { my ($color, $number) = split '::', $value; print "\t$color => $number\n"; } print $/; }

    Update: Fixed stupidity with '|' character (Thanks, duff!)

    ------
    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    ... strings and arrays will suffice. As they are easily available as native data types in any sane language, ... - blokhead, speaking on evolutionary algorithms

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

      Well, your method uses regular expressions too, they're just arguments to the split() routine. BTW, your second split won't work quite right as | is special in regular expressions and the first argument to split is always a regular expression (except for the one special case of ' ').

      Also, it appears from the problem description that a pipe symbol also separates individual records, so your solution won't quite do the right thing anyway. But, I suspect that the original poster left something out when he said

      The only constant is the placement of the "::" and the "|".
      as it looks like the number of things is also constant (otherwise why use such a restrictive RE?). So, it may be that a plain-jane pattern match with captures is the Right Way for this particular problem.

Re: Regular Expression Trick
by Anonymous Monk on Nov 24, 2003 at 06:03 UTC
    @instances = m/([^:]+::(?:[^:]+::[^:]+(?:\||$))+)/g;