japhmi has asked for the wisdom of the Perl Monks concerning the following question:

I have a file where there are various lines, some of which are in the format:
foo = bar foo = bar bar
There are other lines with other formats, but I don't care about them. What I want is to make a hash that looks like this:
%hash = { "foo" => "bar" "foo" => "bar bar" )
The expression I have looks like this, but it's not working like I expected (this is within a while loop going through the file):
if( $_ =~ /(\w*) = (\w*)/ ){ $filehash{ $1 } = $2; }
Could the Monks please help me find a better regex? Thanks.

Replies are listed 'Best First'.
Re: Determining the regex for foo=bar
by BrowserUk (Patriarch) on Mar 11, 2010 at 22:20 UTC

    I want is to make a hash that looks like this:

    %hash = { "foo" => "bar" "foo" => "bar bar" )

    You can't! A hash cannot contain duplicate keys.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Now I feel dumb for that in my example. Yes, there will be no duplicate keys.
Re: Determining the regex for foo=bar
by kennethk (Abbot) on Mar 11, 2010 at 22:24 UTC
    Neglecting the duplicate key issue so astutely pointed out by BrowserUK, I assume your lack of function is because you are capturing only 'word' characters, but your second key contains a space. Crafting regular expressions without a complete basis set is problematic at best, but perhaps you should consider using the $ metacharacter to anchor the second capture to the end of the line?

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %filehash; while (<DATA>) { if ( /(\w*) = (.+$)/ ) { $filehash{ $1 } = $2; } } print Dumper(\%filehash); __DATA__ foo1 = bar foo2 = bar bar

    See perlre or perlretut for more info.

      You'e solution is the one I came up with...almost.

      My regex was /(\w+} = (.+)$/ where I chose not to include the end-of-line in the second capturing expression.

      Generally I prefer the regex approach to the split() approach because of the possibility that there might be extraneous info prior to the word to be used as the key.

      But of course, if anything (and everything) before the = sign is to be used as the 'key' then either the split() or an adjusted regex /^(,+) = (.+)$/ would work fine for me. But then I'm not sure how the OP would only capture, in the hash, those keys that he's interested in (which seem to be simple words). Of course, I guess that would'nt be a problem if the only thing before the ='s was always just a single-word key.

      ack Albuquerque, NM
Re: Determining the regex for foo=bar
by toolic (Bishop) on Mar 11, 2010 at 22:24 UTC

      It might be safer to add a limit to the split just in case there are more equals signs later in the line.

      while ( <$inFH> ) { chomp; my ($k, $v) = split /\s*=\s*/, $_, 2; ... }

      I hope this is of interest.

      Cheers,

      JohnGG