perl_mystery has asked for the wisdom of the Perl Monks concerning the following question:

I need to construct a hash for a file whose input looks like below.A sample output is shown. 1.How do I construct a hash for the below input? 2.I have written a code to achieve this but I having trouble with the regex constructed for the key and value below,not sure what is wrong? 3.Printing the hash shows an empty output,why?

INPUT:-

.\root\edit\perl\scripts\scripths\sec\inc\script_auth_pap.h-113115;perforcePLF.txt;//programfiles/documents/data/lookup/script_auth_pap.h - label_scriptHS_source.01.16.00 : 5

.\root\edit\perl\scripts\scripths\sec\inc\script_auth_peap.h-34348;perforcePLF.txt;//programfiles/documents/data/lookup/script_auth_peap.h - label : 5

.\root\edit\perl\scripts\scripths\sec\inc\script_auth_peap.h-113116;perforcePLF.txt;//depot/old/text/data/script_auth_peap.h - label_scriptHS_source.01.16.00 : 5

.\root\edit\perl\scripts\scripths\sec\inc\script_auth_ttls.h-34349;perforcePLF.txt;//source/new/text/files/data/script_auth_ttls.h - label : 5

OUTPUT:-

HASH should like below

//programfiles/documents/data/lookup/script_auth_pap.h=>root\edit\perl\scripts\scripths\sec\inc\script_auth_pap.h

//programfiles/documents/data/lookup/script_auth_peap.h=>root\edit\perl\scripts\scripths\sec\inc\script_auth_peap.h

//depot/old/text/data/script_auth_peap.h=>\root\edit\perl\scripts\scripths\sec\inc\script_auth_peap.h

//source/new/text/files/data/script_auth_ttls.h=>root\edit\perl\scripts\scripths\sec\inc\script_auth_ttls.h

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash; open my $fh, '<', $ARGV[0] or die "could not open $ARGV[0]'' $!"; while (my $line = <$fh>) { my $key= $line =~/;(.*)\s-\s/; #match anything between ; and - is k +ey my $value= $line =~/\.\\(.*)-\d+\;/; #match anything between .\ and - + is value $hash{$key}=$value; } print Dumper(\%hash);

Replies are listed 'Best First'.
Re: Constructing a hash - why isn't my regex matching anything
by ELISHEVA (Prior) on Dec 19, 2010 at 09:35 UTC

    There are two problems that are making your regex fail:

    • your key regex is isn't precise enough - it matches more than you intended.
    • as mentioned by anonymous monk, assignments like $key = $line =~ regex assigns the number of matches, not the value of the match. Perl has a concept of "scalar context" and "array context". Certain functions, operations and routines return different values depending on whether they think they are assigning a value to a scalar (i.e. a variable beginning with $) or an array, that is a variable beginning with @ or a list of variables enclosed in parenthesis.

    You can insert the following debugging code into your loop and you will see what I mean:

    my ($key,$value); # DEBUG - BEGIN # your original regex $key= $line =~/;(.*)\s-\s/; $value= $line =~/\.\\(.*)-\d+\;/; print STDERR "key=<$key> value=<$value>\n"; #outputs: key=<1> value=<1> # the right way to get the value of the matched string in a # one liner - the parenthesis around ($key) and ($value) tell # perl that you want to return the array of matches, NOT the # number of matches. ($key) = $line =~/;(.*)\s-\s/; ($value) = $line =~/\.\\(.*)-\d+\;/; print STDERR "key=<$key> value=<$value>\n"; # outputs: key=<perforcePLF.txt;//programfiles/documents/data/lookup +/script_auth_pap.h> value=<\root\edit\perl\scripts\scripths\sec\inc\s +cript_auth_pap.h> # The above still doesn't work because your key will include all # file names after the first ";" before " - " and not just the one # between the last ";" and " - ". To get only the last one you need # a more restrictive regex, one that insures that there # are no ";" in your key, e.g. ([^;]*). You also proabably # want to have a key with at least one character, so you should use # ([^;]+) rather than ([^;]*). ($key) = $line =~/;([^;]+)\s-\s/; ($value) = $line =~/\.\\(.*)-\d+\;/; # see comment below for why this is printed out to # STDERR and is followed by "last" print STDERR "key=<$key> value=<$value>\n"; last; # DEBUG - END

    I put last; as the final statement in the debugging code because when a regex is bombing even on simple lines, the bug is usually visible in the first iteration and there is not much value in dumping and scanning the complete result of the process, let alone the end product hash. In fact, it can make the error harder to find and fix because of the excess detail. The #DEBUG - BEGIN and #DEBUG - END comments are there to make sure you can easily find a long stretch of debugging code. Leaving a stray "last" in your code would not be a good thing!

    I printed the debugging messsages out to STDERR for two reasons. First, it also makes it easier to find debugging code that should be commented out when you no longer need it. Second if your debugging statements print to STDERR, they will still be visible if you run your code as part of a test suite using prove MyTest.t.

    In addition to the links on array/scalar context posted above by aonymous monk, you might want to look at the following documentation: wantarray, scalar and this blog article by Perl Monk, chromatic, "From Novice to Adept: Scalar Context" at http://www.modernperlbooks.com/mt/2009/10/from-novice-to-adept-scalar-context-and-arrays.html

    Update: added links to learn more about scalar and array context

      Thanks a lot for detailed explanation

      I have one more question.I am trying to run the above script on a file 125000 lines and output the hash to a text file.I keep getting "Out of memory!" message,is there something that can be done about it?

        Yes. Use less memory.

        Look over your program where you are needlessly wasting memory. Maybe you are reading a complete file into memory instead of processing it line by line. Maybe you are doing something else that wastes memory.

        Even still, 125000 lines is not much, so most likely you are doing something that wastes a lot of memory.

        The hash alone takes up between 12 and 13 megs (125,000 * 100 chars per key-value pair), but 13 megs isn't a great deal of memory on most machines these days. What sort of machine are you on? Are you by any chance running this script on a server or virtual machine with some sort of artificial per-process memory cap?

        Another possibility: How do you construct this file that you are extracting keys and values from? Earlier you posted a question about recursive extraction of file names. Is this part of the same script? Perhaps earlier or later in your script (above or below this loop) you have some left over code that slurped in a very large file all at once? Or perhaps your recursion rather than this loop is eating up all of the memory?

Re: Constructing a hash - why isn't my regex matching anything
by PeterPeiGuo (Hermit) on Dec 19, 2010 at 08:27 UTC

    One way to understand the result of the matching operation is to view it as a true/false value that indicates whether there is a match, and that's not what you wanted. To get what you want, do the following (you may have to further tweak your regexp, but that's a different story):

    $line =~/;(.*)\s-\s/; my $key = $1; $line =~/\.\\(.*)-\d+\;/; my $value = $1;

    Peter (Guo) Pei

      I keep getting the below warning

      Use of uninitialized value in hash element at hash_construct.pl line 15, <$fh> line 909. Use of uninitialized value in concatenation (.) or string at hash_construct.pl line 11, <$fh> line 910.

      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash; open my $fh, '<', $ARGV[0] or die "could not open $ARGV[0]'' $!"; while (my $line = <$fh>) { $line =~/;(.*)\s-\s/; my $key = $1; print "KEY:$key\n"; $line =~/\.\\(.*)-/; my $value = $1; print "VALUE:$value\n"; $hash{$key}=$value; } print Dumper(\%hash);
        Yes, I know, do you have a question?
Re: Constructing a hash - why isn't my regex matching anything
by Anonymous Monk on Dec 19, 2010 at 08:32 UTC
    The match operator (m//), in scalar context ( my $foo = m//;) , returns the number of matches ( in your code/data, 1 for key, 1 for value).

    The solution is to either use $1 for assignment if( // ){ $key = $1; ...) or use list context ( my ( $foo ) = //;)

    Tutorials: Context in Perl: Context tutorial

    FYI, when posting

    • Use code tags for all code and data
    • show actual program output (in code tags) because it was not empty
Re: Constructing a hash - why isn't my regex matching anything
by Anonymous Monk on Dec 19, 2010 at 08:55 UTC
    I would have written the following:
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash; while (<>) { if (my ($key, $value) = /\.\\(.+?)-\d+;(.+?)\s-\s/) { $hash{$key} = $value; } else { warn "couldn't match key and value on line $."; } } print Dumper(\%hash);