Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a file that contains many lines of text in the form of:

text one="1" two="2" three="3" ... x="y"

There can be any number of attribute="value" type pairs.

I'd like a regex that puts each pair into a $n variable. so one="1" would end up in "$1" (or $2, etc ...)

This is what i have so far, that doesn't work:
(.+?)\s+?(.+?\=.+?\s)+
the other option is to split each line on white space and then process each pair that way, but it seems like this should be possible via a regular expression as well.

Replies are listed 'Best First'.
Re: Regex to match "this=that" multiple times
by moritz (Cardinal) on Jul 16, 2008 at 16:29 UTC
    split is the way to go.

    Another option:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $str = q{one="1" two="2" three="3" ... x="y" }; my @list; while ($str =~ m/(\w+)="([^"]+)"/g){ push @list, [$1, $2]; } print Dumper \@list;

    The reason why you can't put them all into $1, $2, $3, ... is that the mapping from parenthesis groups to numbers is done at the compile time of the regex.

    Update: fixed small glitch in regex, pc88mxer++ for pointing out.

    Second update: There is another way involving just one regex, but it uses the evil and experimental (?{...}) code assertions. Read the warnings in perlre before using it.

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $str = q{one="1" two="2" three="3" ... x="y" }; our @list; $str =~ m{ (?> (\w+) # key = "([^"]+)" # value \W* # anything inbetween (?{ push @list, [$1, $2] }) )+ }xs or die "No match"; print Dumper \@list;
Re: Regex to match "this=that" multiple times
by kyle (Abbot) on Jul 16, 2008 at 16:47 UTC

    Here's another one...

    use Data::Dumper; my $text = 'text one="1" two="2" three="3" ... x="y"'; my $attribute_r = qr{ \w+ }xms; my $equals_r = qr{ \s* = \s* }xms; my $value_r = qr{ \" [^"]+ \" }xms; my %value_of; while ( $text =~ s{ ( $attribute_r ) $equals_r ( $value_r ) }{}xms ) { my ( $name, $value ) = ( $1, $2 ); $value =~ tr/"//d; $value_of{ $name } = $value; } print Dumper \%value_of; __END__ $VAR1 = { 'three' => '3', 'one' => '1', 'x' => 'y', 'two' => '2' };

    Note the limits, of course:

    • Attribute names match \w+.
    • There's no escaping of quotes supported.
    • Two attributes with the same name (yuck) won't work.
    • It destroys the text as it works, so make a copy if you'll want it later.
Re: Regex to match "this=that" multiple times
by toolic (Bishop) on Jul 16, 2008 at 16:36 UTC
    use strict; use warnings; use Data::Dumper; while (<DATA>) { my @pairs; while (/(\w+="\w+")/g) {push @pairs, $1} print Dumper(\@pairs); } __DATA__ text one="1" two="2" three="3" x="y"

    prints:

    $VAR1 = [ 'one="1"', 'two="2"', 'three="3"', 'x="y"' ];

    Update: The key is to use the regex global modifier //g and a while loop. See also perlrequick.

Re: Regex to match "this=that" multiple times
by GrandFather (Saint) on Jul 16, 2008 at 21:09 UTC

    Are you looking for something like:

    use strict; use warnings; my $line = 'one="1" two="2" three="3" x="y"'; my @parts = $line =~ /([^=]* = \s* "[^"]*" )\s*/gx; print "$_\n" for @parts;

    Prints:

    one="1" two="2" three="3" x="y"

    Perl is environmentally friendly - it saves trees
Re: Regex to match "this=that" multiple times
by smokemachine (Hermit) on Jul 16, 2008 at 19:18 UTC
    > perl -MData::Dumper -e '%hash = "lala=lele lili=lolo lulu = lala\nae +iou =abcde" =~ /(\S+)\s*=\s*(\S+)/g; print Dumper \%hash' $VAR1 = { 'lulu' => 'lala', 'lili' => 'lolo', 'lala' => 'lele', 'aeiou' => 'abcde' };
    or
    > perl -MData::Dumper -e '@array = "lala=lele lili=lolo lulu = lala\na +eiou =abcde" =~ /\S+\s*=\s*(\S+)/g; print Dumper \@array' $VAR1 = [ 'lele', 'lolo', 'lala', 'abcde' ];
Re: Regex to match "this=that" multiple times
by pc88mxer (Vicar) on Jul 16, 2008 at 16:41 UTC
    the other option is to split each line on white space and then process each pair that way
    You won't be able to have spaces in your "values", but if that works for you, then go for it.