tbone has asked for the wisdom of the Perl Monks concerning the following question:

I'm a newbie and all I know is what I'm doing cant be the best way. I have a complex hash that if it matches something will print the persons name to one of 30 files. This is how I did it, but there must be a more efficient way!?! Also I'm trying to create a directory for all the files, but that doesnt work. Thanks for any tips
mkdir "reports", 0755 or warn "Cannot make reports director: $!"; chdir ("/reports"); open(A, ">groupA.txt"); open(B, ">groupB.txt"); ext.... foreach my $n (keys %emp){ print A $emp{$n}{'Emp'} if $emp{$n}{'Org'}=~/ABC/; print B $emp{$n}{'Emp'} if $emp{$n}{'Org'}=~/DEF/; ext..}

Replies are listed 'Best First'.
Re: efficiently printing to 30 files
by talexb (Chancellor) on Mar 06, 2003 at 22:11 UTC

    Is there any reason that you want to print as soon as you see someone's name? Why not instead just push to a hash of arrays, then dump the information out at the end? That way you won't have 30 file handles open at once (that used to be limited to 20 handles for C programs but Perl no doubt handles that nicely).

    foreach my $n (keys %emp) { push(@{$data{'A'}},$emp{$n}{'Emp'}) if $emp{$n}{'Org'}=~/ABC/; push(@{$data{'B'}},$emp{$n}{'Emp'}) if $emp{$n}{'Org'}=~/DEF/; }

    You may want to initailize the hash of arrays to start with so as to avoid those ugly warnings. Afterwards you can dump the data to files one by one.

    foreach(keys %data){ open(DATA,">group$_.txt")||die"Couldn't open file for $_:$!"; print DATA join("\n",@{$Data{$_}}); close(DATA); }
    --t. alex
    Life is short: get busy!
      Maybe this idiom could save some keystrokes and improve readability. I still don't like the repetitions of $emp{ $_ ...: ideas?
      push @{ $emp{ $_ }{ 'Org' } =~ /ABC/ ? $data{ A } : $emp{ $_ }{ 'Org' } =~ /DEF/ ? $data{ B } : [] }, $emp{ $_ }{ 'Emp' } for keys %emp;

        Your approach is workable, but I really wouldn't want to commit myself without having a much better idea of what problem we're trying to solve. In any case, I prefer to use the ternary operator '?' for single cases or occasionally double cases. Beyond that is getting a little bit too clever (but don't take that personally).

        I have a feeling that either a series of if statements or a loop would solve this particular problem, but if there's a way to find out which values are more likely to occur, I would want to test for those values first.

        All this arm-waving is good fun, but as I said, without having a good understanding of what the original poster's problem is, we can't offer a totally effective solution.

        --t. alex
        Life is short: get busy!
Re: efficiently printing to 30 files
by LanceDeeply (Chaplain) on Mar 06, 2003 at 22:11 UTC
    I was thinking you could a case statement.
    Here's an example from the perlfaq7.

    BUT... is there a relationship between the search pattern and the group files?
    because then you can make a hash of that relationship like...
    my %groupMap = ( 'ABC' => 'groupA.txt', 'DEF' => 'groupB.txt', 'GHI' => 'groupC.txt' # and so on and so on )

    then you can check each key in %emp against the keys in %groupMap...

    foreach my $n (keys %emp) { my ($pattern,$file); while (($pattern,$file) = each %groupMap) { if ( $emp{$n}{'Org'} =~ /$pattern/ ) { saveEmpToFile($emp{$n}{'Emp'}, $file); } } } sub saveEmpToFile { my $emp = shift; my $file = shift; # note: openning for append open(OUTFILE, ">>$file"); print OUTFILE $emp; close OUTFILE; }

    this way- you can just add more mappings to %groupMap without adding more lookup code in your for loop.

    HTH
Re: efficiently printing to 30 files
by BrowserUk (Patriarch) on Mar 07, 2003 at 01:37 UTC

    There are a couple of ways you could possibly improve the efficiency of this process, but whether they are useful will depend on if I have read between the lines of your post correctly.

    First, you give a definitive number of 30 files, which suggests to me that the /ABC/ and /DEF/ are placeholders for Department or Division names or codes, and it is possible that the value of $emp{n}{Org} is the entire thing? Ie. 'ABC' or 'DEF' from your example. Of course, if this is the case, then you would (should) be using eq not =~ for your comparison, so if I've reached a step too far, hit the -- and ignore the rest of this post.

    Your still here? Phew! Okay, I am guessing that your hash looks something like

    my %emp = ( 987654 => { Emp=>'A. Employee', Org=>'ABC', '...'=> }, 987653 => { Emp=>'A.N.Other', Org=>'DEF', '...'=> }, # ..... );

    If this is anything like close to the real situation, you could probably save time by creating a (temporary?) HoA's that mapped Org=>Emp. Something like

    my %OrgEmp; push @{ $OrgEmp{$_}{Org}}, $emp{$_}{Emp} for keys %emp;

    Or if your world doesn't fit with my neat assumptions and your Org ids are more like 'ABC001' & 'DEF34', then you would still need to use regexes, but you could do it when building the temp HoA's

    my $re_org = qr[ '(' . join('|', qw[ABC DEF ...]) . ')' ]o; my %OrgEmp; for my $empNo (keys %emp) { push @{ $OrgEmp{$1} }, $emp{$empNo}{Emp} if $emp{$empNo}{Org} =~ $ +re_org; }

    In either case, the processes of writing out the names of the employees in each organisation becomes trivial

    for my $org (keys %OrgEmp) { open OUT, '>', 'Group' . $org or die $!; print OUT "$_\n" for @{ $OrgEmp{$org} }; close OUT; }

    However, if your data does allow either method and you still need to hold all 30 files open and write to them simultaneously, then you could at least reduce the need for all that repeated PRINT A .... if ....; by building another data structure to hold your filehandles and associate them with the regexes.

    my %groups = map{ open my $fh, '>', "Group$_" or die $!; $_ => $fh; } qw[ABC DEF ...]; for my $empNo (keys %emp) { for my $grp (keys %groups) { #! Note the {} (bare block) around the filehandle are necessar +y. print { $groups{$grp} } $emp{$empNo}{Emp} if $emp{$empNo}{Org} + =~ m[$grp]; } }

    This latter method won't save any processing time, but it could save you some typing and it is scalable when the number of Org's changes. Just add or delete them from the qw[...] on the map and the rest of the program still works.

    As shown, the filenames would end up as GroupABC, GroupDEF, etc which may not fit with your needs, but it probably easier to rename them afterwards than to build the mapoing of regex to GroupA, GroupB, though even that's not so hard.


    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
Re: efficiently printing to 30 files
by zengargoyle (Deacon) on Mar 06, 2003 at 22:14 UTC

    FileCache handles this for you quite nicely, but i've read that it wont' work with 'use strict;'

    $ perl -e 'use FileCache; for("a".."z"){cacheout $_; print $_ $_,$/;}' $ ls a d g j m p s v y b e h k n q t w z c f i l o r u x $ cat a a $ cat b b $
      This was fixed in perl 5.8, I ought to upload that version (1.02) to CPAN seperately, but until then you can get it at ftp://pthbb.org/pub/pm/FileCache/.

      UPDATE: I may have misunderstood what you said. Versions of FileCache prior to 5.8 would, themselves, not compile under strict. this is what I thought you meant. However, it seems likely you were referring to the fact that the current FileCache implementation uses symbolic filehandles and hence requires no strict 'refs'. I have thought of a solution, which has other benefits. Namely, to use IO::Handle. This leads to some difficult decisions though. See my recent posts to perl5-porters for more information.

      --
      I'm not belgian but I play one on TV.

        cool. i caught it as 'Perl Recipe of the Day' on http://www.perl.com a few days ago. the perldoc for the version i have doesn't have much in way of examples. i saw the 'strict' warning on some message board somewhere that google gave me. nice to know it's strict safe now.

Re: efficiently printing to 30 files
by perlguy (Deacon) on Mar 06, 2003 at 22:17 UTC
    though I don't particularly care for printing as you go, and like the hash idea above, this was my attempt at exactly what you proposed (tested lightly, meaning it compiles, and may not work as you wish):
    mkdir "reports", 0755 || warn "Cannot make reports director: $!"; chdir ("reports") || die "couldn't change to directory : $!"; # tried to quickly put these together into a hash, # but made for more code than necessary # the way it stands now, @patterns and @files should have # the same number of (corresponding) elements my @patterns = (qr/ABC/, qr/DEF/, qr/GHI/); my @files = map { local *FH; open(FH, '>', 'group' . $_); *FH } qw(A B + C); foreach my $n (keys %emp) { for my $index (0..$#patterns) { if ($emp{$n}{'Org'} =~ $patterns[$index]) { local *FH = $files[$index]; print FH $emp{$n}{'Emp'} . "\n"; } } }
Re: efficiently printing to 30 files
by cchampion (Curate) on Mar 07, 2003 at 01:24 UTC
Re: efficiently printing to 30 files
by kelan (Deacon) on Mar 06, 2003 at 22:23 UTC

    Well first, the directory might be getting creating, but you aren't changing to it. You're trying to change to a directory called reports directly under / (root). You should change that to chdir 'reports'; (ie, leave out the leading slash). For the second part, printing to 30 files, how about this:

    use IO::File; my %files; $files{A} = $files{B} = $files{C} = IO::File->new('> groupA.txt'); $files{D} = $files{E} = $files{F} = IO::File->new('> groupB.txt'); # etc... for my $n (keys %emp) { print {$files{$n}{Org}} $emp{$n}{Emp}; # ^^^^^^ ^^^^^ # This gets the FH And prints this to it } # Now close the FH's close($file{$key}) for my $key (keys %files);
    You might also want to print some kind of separator between the employee info that you print.

    kelan


    Perl6 Grammar Student

      Ugh. I know that I wouldn't want to type in 8 open statements. In the spirit of TMTOWTDI:
      #!perl -w use strict; my %hash; my $i = "A"; foreach ("A".."Z") { if ( (ord() - ord("A")) % 3 == 0) { $i++; } $hash{$_} = chr(ord($i) - 1); } foreach my $key (sort keys %hash) { print "$key => $hash{$key}\n"; } __END__ A => A B => A C => A D => B E => B F => B G => C H => C I => C J => D K => D L => D M => E N => E O => E P => F Q => F R => F S => G T => G U => G V => H W => H X => H Y => I Z => I
      Of course, you can insert your IO::File statement in pretty easily. Also, I assume that the pattern of 3 letters to a file holds. YMMV

      thor

Re: efficiently printing to 30 files
by Limbic~Region (Chancellor) on Mar 06, 2003 at 22:12 UTC
    tbone,
    There is a command in Unix called tee which allows you to pipe your output to two different locations. IO::Tee is a CPAN module that allows you to select multiple locations. I do not know if it is any more efficient from a speed perspective as I haven't looked under the hood, but it certainly would be more efficient in programming time.

    Hope this helps - cheers - L~R

    Update: Ok so I am sick with the flu and I misread "one of 30 files" to be 30 files. So now what can I offer since all kinds of people have replied with applicable solutions?

    allolex suggested using subs for repeated pieces of code. I would suggest using a sub for a different reason.
    You could use one sub to open all the files and another sub to determine which file handle to select based off the data and then just use print in the main code.

    #!/usr/bin/perl -w use strict; # Some code to build your complex hash OpenFiles(); foreach my $n (keys %emp){ my $print = CheckData($n); if ($print) { select $print; print $emp{$n}{'Emp'}; } } sub OpenFiles { mkdir "reports", 0755 || warn "Cannot make reports director: $!"; chdir ("reports") || die "couldn't change to directory : $!"; # All the open statements } sub CheckData { return "A" $emp{$_[0]}{'Emp'} if $emp{$_[0]}{'Org'}=~/ABC/; # etc }

    Obviously this code could be cleaned up a bit, but I felt bad for having misread your post and not providing anything useful. With a little work - you pass two arguments to your sub, one is the value to check and the other determines which type of check to perform. This way you could do more than just check the 'Org' value.

    Cheers - L~R

Re: efficiently printing to 30 files
by allolex (Curate) on Mar 06, 2003 at 22:16 UTC

    One major thing you could do is put those repeated bits of code into subroutines. Check out perldoc perlsub for more information.

    But I see that there have been a lot of answers in the last few minutes, all better than mine. Maybe I can learn from this experience?

    --
    Allolex

Re: efficiently printing to 30 files
by jonadab (Parson) on Mar 07, 2003 at 13:57 UTC

    Your approach (the way you've written your loop and so forth) seems ideally suited for using symbolic references. Bear in mind that symbolic references must be used with considerable caution. Using them carelessly can lead to hard-to-track-down bugs. But your code can be greatly shortened:

    mkdir "reports", 0755 or warn "Cannot make reports director: $!"; chdir ("reports"); # As someone else pointed out, / makes the path abs +olute. my %group = ( ABC => 'A', DEF => 'B', # You can have whatever mappings you want here. # They don't have to be single letters, either. # But don't have one called STDOUT or cetera. DEFAULT => 'DEFAULT'); no strict refs; # WARNING: This will annoy the 'use strict or die' people. foreach my $f (values %group) { open $f, ">group$f.txt"; } foreach my $n (keys %emp) { if (not defined $group{$emp{$n}{'Org'}}) { warn "Group $emp{$n}{Org} missing from \%group, " . "using DEFAULT group.\n"; $group{$emp{$n}{'Org'}} = $group{'DEFAULT'}; } print $group{$emp{$n}{'Org'}} $emp{$n}{'Emp'}; }

    This does pretty much exactly what you were doing, in pretty much exactly the same way, but with less code, because you only have to ennumerate all your cases once instead of twice.

    If you aren't comfortable with references in general then you should avoid this solution and instead rewrite your whole approach, using one of the other suggestions. You don't want to use symbolic refs unless you can follow what they're doing.


    for(unpack("C*",'GGGG?GGGG?O__\?WccW?{GCw?Wcc{?Wcc~?Wcc{?~cc' .'W?')){$j=$_-63;++$a;for$p(0..7){$h[$p][$a]=$j%2;$j/=2}}for$ p(0..7){for$a(1..45){$_=($h[$p-1][$a])?'#':' ';print}print$/}
      You guys are the best! Thanks so much for the advice..Now I just have to figure out which one is the best. Thanks again