country1 has asked for the wisdom of the Perl Monks concerning the following question:


I have two perl scripts. The first script reads a CSV
file and builds a hash (%User) based on 2 values in the
CSV- the hash is put into an output file. I want to
use the hash in the second script for a table lookup.


The first script to produce the hash is as follows:

#!/usr/bin/perl use strict; use warnings; ########################################################## # # Read UserTable.csv and Create Hash for Lookup Table # ########################################################## my $appl; my $loginuser; for my $file ("UserTable.csv") { open (my $IN,"<",$file) or die "Can't open file $file: $!"; open (my $OUT,">","UserLookup.dat") or die "Can't open file UserLook +up.dat: $! "; print $OUT "my \%User = \n"; print $OUT "{"; while (my $line = <$IN>) { chomp($line); my ($loginuser,$appl) = (split(",",$line)) [0,1]; next if ($appl =~ /^\s*$/); print $OUT "$loginuser \=\> '$appl' \n"; } print $OUT "}\;\n"; close $IN or die "Can't close input file: $!"; close $OUT or die "Can't close result file: $!"; }

My question is how do I now include this hash code (in
UserLookup.dat) in my 2nd script, so that I can do a
table lookup.

Replies are listed 'Best First'.
Re: Use of Hash For Table Lookup
by McDarren (Abbot) on Aug 17, 2007 at 12:22 UTC
    I'd probably use Storable for something like this.
    Instead of printing your hash structure to a 2nd file, just build the hash directly, eg:
    $User{$loginuser} = $appl;
    And then just store() it when you are done.
    Your second script can then simply retrieve() the hash when it needs it.

    (minor nit): I'm wondering about the point of the for loop, because you only process one file?

    Hope this helps,
    Darren :)

Re: Use of Hash For Table Lookup
by moritz (Cardinal) on Aug 17, 2007 at 12:25 UTC

    If you build one of the scripts as a module, you can just pass the hash natively in perl.

    If you don't want to do that, you can use a serialization format, such as Data::Dumper, Storable, YAML or XML. And many others ;-)

    (Update:) If you want to stick to your current solution (and reinvent the Data::Dumper wheel), you can read the file and eval it.

    But I wouldn't recommend that because for example it breaks if some of your data contain quotes.

    Suppose a malicious user knows how your code works, and inserts something like this into your logfile:

    someApp'}; system("Do something malicious here");{'

    If you eval() that, you lost.

    I don't know what data will end up in your logfile, but if you use a module that does the job for you, you are probably not so vulnerable.

Re: Use of Hash For Table Lookup
by FunkyMonk (Bishop) on Aug 17, 2007 at 12:46 UTC
    Data::Dumper does everything that you're trying to achieve:

    use Data::Dumper; my %users = ( me => 1, you => 2, others => 3 ); #dump hash to file open my $file, ">", "temp~" or die $!; print $file Dumper \%users; close $file; #read hash data from file open $file, "<", "temp~" or die $!; my $hash_data = do { local $/; <$file> }; close $file; #eval it into %hash my %hash = do { no strict 'vars'; %{ eval $hash_data } }; print Dumper \%hash;

    Output:

    $VAR1 = { 'you' => 2, 'others' => 3, 'me' => 1 };

    Have a good look at Data::Dumper. Other people have other favorites, but I think it's the third most useful module there is. strict and warnings being the top two, of course.

    update: s/my $VAR1/no strict 'vars'/


      My hash will have up to 1000 entries, so I need to read
      the CSV to create the %User hash. Can you give me any
      ideas or code samples that would show my how to do this
      with the code I included in my question?
        My hash will have up to 1000 entries,

        This is hardly relevant

        so I need to read the CSV to create the %User hash.

        This hardly follows from the previous remark.

        I still can't understand whether you have two separate programs because you have to or just because of a circumstance. Anyway, reading from the CSV and from the file created by your poorly reinvented D::D code wouldn't make such a difference: thus do you really need that intermediate passage?

        Can you give me any ideas or code samples that would show my how to do this with the code I included in my question?

        Well, it can't be done with the code you included in your question. Otherwise you wouldn't be asking here. OTOH several suitable suggestions have already been given to you, perhaps you should comment them explaining why you don't find them satisfactory.

        If you insist on using the CSV file instead of the good solutions already provided, I'd suggest looking into Text::xSV to read back the data in the CSV file and generate your hash in the 2nd script. Heck if you're going to write the file, I'd use it there as well.
Re: Use of Hash For Table Lookup
by RMGir (Prior) on Aug 17, 2007 at 13:13 UTC
    You could just slurp up the whole file into a string and then do
    my %lookupTable = eval $userLookup_dat_contents;
    That's got security issues if you can't trust the file (if someone hostile could overwrite it between passes, for example), but if you set your umask correctly, that seems unlikely.

    Of course, if you've only got one "2nd program" and one "1st program", it doesn't make much sense to have them separated like this - just encapsulate your .csv parsing logic into a module, and use it in your 2nd script, ditching the first script.

    This kind of separation would only make sense if you're going to be parsing data from a lot of different input formats, and using the same "2nd pass" to analyze it, I think.


    Mike
      ++, but you could go one step further.

      Not only might the same second pass make this useful, but different front ends with different back ends that still use the same intermediate format makes sense, too.

      Lots of data formatting tool chains go, for instance, from TeX, POD, PostScript, or SVG to some annotated intermediate format. Then, programs that read that standardized (internally standard anyway, to the tool chain) intermediate format to produce HTML 3, HTML 4, XHTML 1, Docbook, info, man, PostScript, PDF, etc.

      Some compilers do the same sort of thing, actually. Things such as PCode, Java bytecode, Parrot code, or C are produced from any number of front-end translators. Then, the Java bytecode could be run or recompiled to a static executable, the C could be compiled to any number of architectures (so long as it's done with care for portability), etc.

      In theory one could, if demented enough, write parts of a program in Ada, Awk, some Basic dialects, C, C++, Eiffel, Euphoria, Fortran, Haskell, Intercal, Java, Pascal, Prolog, Scheme, and Simula then translate them to C since all those languages have translators available to target C. Then, changes can be made at the C level to tie them closer together or to optimize a bit, the compiler can optimize the C, and all of the languages can work together without worrying about one another. Many C compilers actually do produce assembly then call the assembler, or have an option to produce assembly for the target platform instead of making an object file. The C can even be translated to other languages, like Fortran, Java, Java bytecode, C#, or Ada. Don't expect Ada-to-C-to-Fortran-to-C-to-Ada to look like the original program, of course.