Re: Constructing a hash - why isn't my regex matching anything

There are two problems that are making your regex fail:

your key regex is isn't precise enough - it matches more than you intended.
as mentioned by anonymous monk, assignments like $key = $line =~ regex assigns the number of matches, not the value of the match. Perl has a concept of "scalar context" and "array context". Certain functions, operations and routines return different values depending on whether they think they are assigning a value to a scalar (i.e. a variable beginning with $) or an array, that is a variable beginning with @ or a list of variables enclosed in parenthesis.

You can insert the following debugging code into your loop and you will see what I mean:

  my ($key,$value);

  # DEBUG - BEGIN
  # your original regex
  $key= $line =~/;(.*)\s-\s/;
  $value= $line =~/\.\\(.*)-\d+\;/;
  print STDERR "key=<$key> value=<$value>\n";

  #outputs:  key=<1> value=<1>

  # the right way to get the value of the matched string in a
  # one liner - the parenthesis around ($key) and ($value) tell
  # perl that you want to return the array of matches, NOT the
  # number of matches.

  ($key) = $line =~/;(.*)\s-\s/;
  ($value) = $line =~/\.\\(.*)-\d+\;/;
  print STDERR "key=<$key> value=<$value>\n";

  # outputs: key=<perforcePLF.txt;//programfiles/documents/data/lookup
+/script_auth_pap.h> value=<\root\edit\perl\scripts\scripths\sec\inc\s
+cript_auth_pap.h>

  # The above still doesn't work because your key will include all
  # file names after the first ";" before " - " and not just the one
  # between the last ";" and " - ".  To get only the last one you need
  # a more restrictive regex, one that insures that there
  # are no ";" in your key, e.g. ([^;]*).  You also proabably
  # want to have a key with at least one character, so you should use
  # ([^;]+) rather than ([^;]*).

  ($key) = $line =~/;([^;]+)\s-\s/;
  ($value) = $line =~/\.\\(.*)-\d+\;/;

  # see comment below for why this is printed out to
  # STDERR and is followed by "last" 
  print STDERR "key=<$key> value=<$value>\n"; last;
  # DEBUG - END
[download]

I put last; as the final statement in the debugging code because when a regex is bombing even on simple lines, the bug is usually visible in the first iteration and there is not much value in dumping and scanning the complete result of the process, let alone the end product hash. In fact, it can make the error harder to find and fix because of the excess detail. The #DEBUG - BEGIN and #DEBUG - END comments are there to make sure you can easily find a long stretch of debugging code. Leaving a stray "last" in your code would not be a good thing!

I printed the debugging messsages out to STDERR for two reasons. First, it also makes it easier to find debugging code that should be commented out when you no longer need it. Second if your debugging statements print to STDERR, they will still be visible if you run your code as part of a test suite using prove MyTest.t.

In addition to the links on array/scalar context posted above by aonymous monk, you might want to look at the following documentation: wantarray, scalar and this blog article by Perl Monk, chromatic, "From Novice to Adept: Scalar Context" at http://www.modernperlbooks.com/mt/2009/10/from-novice-to-adept-scalar-context-and-arrays.html

Update: added links to learn more about scalar and array context

Comment on Re: Constructing a hash - why isn't my regex matching anything Select or Download Code

Replies are listed 'Best First'.
Re^2: Constructing a hash - why isn't my regex matching anything by perl_mystery (Beadle) on Dec 19, 2010 at 10:41 UTC
Thanks a lot for detailed explanation I have one more question.I am trying to run the above script on a file 125000 lines and output the hash to a text file.I keep getting "Out of memory!" message,is there something that can be done about it?	[reply]
Re^3: Constructing a hash - why isn't my regex matching anything by Corion (Patriarch) on Dec 19, 2010 at 10:49 UTC
Yes. Use less memory. Look over your program where you are needlessly wasting memory. Maybe you are reading a complete file into memory instead of processing it line by line. Maybe you are doing something else that wastes memory. Even still, 125000 lines is not much, so most likely you are doing something that wastes a lot of memory.	[reply]
Re^4: Constructing a hash - why isn't my regex matching anything by perl_mystery (Beadle) on Dec 19, 2010 at 10:56 UTC
I did look at my code(below),I am hardly doing anything other than constructing the hash `#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash; open my $fh, '<', $ARGV[0] or die "could not open $ARGV[0]'' $!"; while (my $line = <$fh>) { #print "line:$line\n"; my ($key) = $line =~ /;([^;]+)\s-\s/; #print "KEY:$key\n"; my ($value) = $line =~ /\.\\(.*)-\d+\;/; #print "VALUE:$value\n"; if (!($hash{$key})) { $hash{$key}=$value; } } open my $hash, '>', "hash_flf.txt"; print Dumper(\%hash); close $hash;` [download]	[reply] [d/l]
Re^5: Constructing a hash - why isn't my regex matching anything by Anonyrnous Monk (Hermit) on Dec 19, 2010 at 11:36 UTC
Re^3: Constructing a hash - why isn't my regex matching anything by ELISHEVA (Prior) on Dec 19, 2010 at 11:24 UTC
The hash alone takes up between 12 and 13 megs (125,000 * 100 chars per key-value pair), but 13 megs isn't a great deal of memory on most machines these days. What sort of machine are you on? Are you by any chance running this script on a server or virtual machine with some sort of artificial per-process memory cap? Another possibility: How do you construct this file that you are extracting keys and values from? Earlier you posted a question about recursive extraction of file names. Is this part of the same script? Perhaps earlier or later in your script (above or below this loop) you have some left over code that slurped in a very large file all at once? Or perhaps your recursion rather than this loop is eating up all of the memory?	[reply]
Re^4: Constructing a hash - why isn't my regex matching anything by Anonymous Monk on Dec 19, 2010 at 11:40 UTC
I think it takes more than that :) Read more... (2 kB) The numbers are in Kbytes and the memory usage doubles due to Data::Dumper, from 78MB to 142MB	[reply] [d/l]
Re^5: Constructing a hash - why isn't my regex matching anything by ELISHEVA (Prior) on Dec 19, 2010 at 12:11 UTC
Re^6: Constructing a hash - why isn't my regex matching anything by Anonyrnous Monk (Hermit) on Dec 19, 2010 at 12:59 UTC
Re^6: Constructing a hash - why isn't my regex matching anything by Anonymous Monk on Dec 19, 2010 at 12:57 UTC