rapide has asked for the wisdom of the Perl Monks concerning the following question:

I have a line that contains an unknown set of characters in an arbitrary length.
I want to parse that line and collect all the characters within the two apostrophes (") into an hash. e.g:
my $line = 'One animal="ap3!" and another one fish="s4lm%%on" can all be eaten. (So can the bird="sparr0w!$$")';
What I want is to have:
$something{animal} => ap3! $something{fish} => s4lm%%on $something{bird} => sparr0w!$$
The solution I have tried so far does not work:
while($line =~ s/([\w\d\_]+)="(.+)"//) { $something{$1} = $2; }

Any suggestions?

Replies are listed 'Best First'.
Re: Regular Expression: Matching arbitrary characters
by moritz (Cardinal) on Dec 01, 2008 at 14:11 UTC
    This works for me:
    use strict; use warnings; use Data::Dumper; my $line = 'One animal="ap3!" and another one fish="s4lm%%on" can all be eaten. (So can the bird="sparr0w!$$")'; my %hash; while ($line =~ m/\b(\w+)="([^"]*)"/g) { $hash{$1} = $2; } print Dumper \%hash; __END__ $VAR1 = { 'animal' => 'ap3!', 'bird' => 'sparr0w!$$', 'fish' => 's4lm%%on' };

    \w includes digits and underscores, no need to list them separately.

Re: Regular Expression: Matching arbitrary characters
by JavaFan (Canon) on Dec 01, 2008 at 14:12 UTC
    If you define "does not work", you are 95% on the way of finding the answer.

    I'm guessing here, because "does not work" may very well be "there are monkeys flying out of my computer", but there's a chance you actually do not want to match "arbitrary characters" at all. What you want to match is "anything that isn't a double quote" (BTW, an apostrophe is a single quote, your example matches between double quotes). Now, changing from "arbitrary" to "anything that isn't a double quote" is easy in your regexp.

    And I leave that as an exercise.

Re: Regular Expression: Matching arbitrary characters
by lostjimmy (Chaplain) on Dec 01, 2008 at 15:01 UTC
    In addition to what everyone else has said, I would also like to note that you are using $line =~ s/...//, which is a substitution. That's not what you want. You want to use $line =~ m/.../g. The /g modifier is used for global matching, which means on each iteration of the loop, it will save the current position in the string and match from there on the next iteration.
      Actually, the OP's use of the s/// operator is a perfectly legitimate and sensible approach, assuming that there's no problem with obliterating the input string as you go. As you point out, using m//g is also a good approach.
Re: Regular Expression: Matching arbitrary characters
by jethro (Monsignor) on Dec 01, 2008 at 14:38 UTC
    + and * match greedily, i.e. they match as many characters as they can get without failing the rest of the pattern. In your case that is nearly everything in your string. One solution has been given to you by the posters above, another is to make + non-greedy by appending a ? i.e. using (.+?)