bivouac has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

There are times when I love Perl, but there are times when I hate it. I love the easy questions but once things get a little complex, I get completely lost.

I think most of my hate stems from the fact that I'm not a programmer by trade and approach things incorrectly in the Perl and programming worlds - I've tried everything it seems (psuedocode, Llama, Camel, Cookbook, Perldocs, Perl Monks) and I still get stuck @ the stupidest places. Case in point this problem I'm currently having.

I have three sets of data that are comprised of IDs that overlap. I have another set of data that contains a bunch of fields including the ID from the original sets of data. How can I sort the data using IF/ELSIF statements into different OUTPUT files? Should I be using hashes and not arrays for the original data sets? Why do I suck so bad? Any other tips for someone still struggling with Perl after years of tinkering?

#!/usr/bin/perl -w use strict; my @a = qw( 60622 60516 60201 ); my @b = qw( 90210 60622 12345 ); my @c = qw( 11412 32134 60201 ); open OUTPUT_A, ">a.txt" or die "Can't open OUTPUT_A: $!\n"; open OUTPUT_B, ">b.txt" or die "Can't open OUTPUT_B: $!\n"; open OUTPUT_C, ">c.txt" or die "Can't open OUTPUT_C: $!\n"; while (<DATA>) { chomp; my($fn, $ln, $id) = split(",", $_); # print $fn, $ln and $id to OUTPUT_A if $id = @a[0,1,2...n] # elsif print $fn, $ln and $id to OUTPUT_B if $id = @b[0,1,2...n] # elsif print $fn, $ln and $id to OUTPUT_C if $id = @c[0,1,2...n] + } close OUTPUT_A; close OUTPUT_B; close OUTPUT_C; __END__ Homer,Simpson,60622 Clark,Kent,90210 Fred,Flintstone,00987
you can't be what you were...so you better start being just what you are....

Replies are listed 'Best First'.
Re: A Love/Hate Relationship - A Long Time Newbie's Current Block
by Abigail-II (Bishop) on Nov 05, 2003 at 17:44 UTC
    I would make a hash, keyed on the id's, with the open filehandles as values:
    #!/usr/bin/perl use strict; use warnings; open my $fh_a => "> a.txt" or die "open a.txt: $!"; open my $fh_b => "> b.txt" or die "open b.txt: $!"; open my $fh_c => "> c.txt" or die "open c.txt: $!"; my @a = qw /60622 60516 60201/; my @b = qw /90210 60622 12345/; my @c = qw /11412 32134 60201/; my %data; @data {@a} = ($fh_a) x @a; @data {@b} = ($fh_b) x @b; @data {@c} = ($fh_c) x @c; while (<DATA>) { chomp; my ($fn, $ln, $id) = split /,/; next unless $data {$id}; print {$data {$id}} "$_\n"; } close $fh_a or die "close a.txt: $!"; close $fh_b or die "close b.txt: $!"; close $fh_c or die "close c.txt: $!"; __DATA__ Homer,Simpson,60622 Clark,Kent,90210 Fred,Flintstone,00987

    Abigail

      Cool idea but if you read the question carefully:
      # print $fn, $ln and $id to OUTPUT_A if $id = @a[0,1,2...n] # elsif print $fn, $ln and $id to OUTPUT_B if $id = @b[0,1,2...n] # elsif print $fn, $ln and $id to OUTPUT_C if $id = @c[0,1,2...n]
      The way you are doing it both Homer and Clark end up in "b.txt". The way the question was written Homer should wind up in "a.txt" and Clark in "b.txt". All you have to do is reverse the order you fill your %data hash.
      @data {@c} = ($fh_c) x @c; @data {@b} = ($fh_b) x @b; @data {@a} = ($fh_a) x @a;
      Now the code works as the question was asked.

      --

      flounder

Re: A Love/Hate Relationship - A Long Time Newbie's Current Block
by EvdB (Deacon) on Nov 05, 2003 at 18:13 UTC
    Not a comment on the code - just a bit of a pick-me-up for you.
    • You are not crap, if you were crap you would not have use strict; or the '-w' flag set.
    • We all have trouble at times with things that others take for granted. I, for example, have a Spoonerism problem. This is where you swap the first letters of two word as you are speaking. The more important it is that I don't appear stupid the more I do it. eg: "I'm as jober as a sudge, officer".
    • Even the best get it wrong and can't fix it. I have a degree in physics so a friend calls me to get help with a simple maths problem. Can't do it - don't even know where to start. Brain is a complete blank. Friend hangs up and I crack it in seconds. Lods saw, as it were.

    Go for a walk, kick a ball, have a cup of tea. The worst way to crack a problem is to keep trying to crack it while you know you are stuck. This is why I'm here at the monks for a bit - I too am stuck on a problem.

    You're fine.

    --tidiness is the memory loss of environmental mnemonics

•Re: A Love/Hate Relationship - A Long Time Newbie's Current Block
by merlyn (Sage) on Nov 05, 2003 at 17:37 UTC
Re: A Love/Hate Relationship - A Long Time Newbie's Current Block
by bluto (Curate) on Nov 05, 2003 at 19:05 UTC
    When I'm lost designing code, I generally take a step back. One thing to try is to write the design in pseudocode...
    read in each user record (first,last,id) lookup id in a list of id's to determine output file print user record to correct output file end read loop
    At this point I don't clutter the design with implementation details, like when do I open or close files. Since the design is not too complex, I'd start to think about how to implement parts of this. Things like "lookup id in a list" bring to mind using hashes since they are designed for fast lookup of single items (see Abigail-IIs example). You could of course use arrays or "if" statements for the "list", and if you have learned other progamming languages before, this might be your first inclination. This is why when learning a new language, I try to examine examples of (good) code written in that language. It just makes understanding this kind of mental association much easier when it comes time to implement code.
Re: A Love/Hate Relationship - A Long Time Newbie's Current Block
by ChrisS (Monk) on Nov 05, 2003 at 21:41 UTC
    I like the answers I've seen, but I thought you might like to see one that stays very close to the original code, and uses the conditional logic you posted in your Perl comments.
    #!/usr/bin/perl -w use strict; my %a = qw( 60622 1 60516 1 60201 1 ); my %b = qw( 90210 1 60622 1 12345 1 ); my %c = qw( 11412 1 32134 1 60201 1 ); open OUTPUT_A, ">a.txt" or die "Can't open OUTPUT_A: $!\n"; open OUTPUT_B, ">b.txt" or die "Can't open OUTPUT_B: $!\n"; open OUTPUT_C, ">c.txt" or die "Can't open OUTPUT_C: $!\n"; while (<DATA>) { chomp; my($fn, $ln, $id) = split(",", $_); if (exists $a{$id}) { print OUTPUT_A $fn, $ln, $id; } elsif (exists $b{$id}) { print OUTPUT_B $fn, $ln, $id; } elsif (exists $c{$id}) { print OUTPUT_C $fn, $ln, $id; } } close OUTPUT_A; close OUTPUT_B; close OUTPUT_C; __END__ Homer,Simpson,60622 Clark,Kent,90210 Fred,Flintstone,00987

    This code creates an associative array for each output file, and checks to see if each $id value exists as a key in the table.

    As to your other comments... most of us go through such frustrations at times. The other monks have given you great counsel about relaxing.

    One recommendation I would add for improving your Perl proficiency: check out the various perlfaq sections in the documentation. Especially perlfaq4 about handling different kinds of data.

    You could also browse the Monastery's "Categorized Q&A" and "Tutorials" sections.

      Chris - This is exactly what my code looks like! And, it works like a champ! Thanks again everyone for the help - super appreciated.
Re: A Love/Hate Relationship - A Long Time Newbie's Current Block
by Jaap (Curate) on Nov 05, 2003 at 17:34 UTC
    If $id were 60201 would you really only want to print it to OUTPUT_A?
    If so, you could put @a in a hash (as you mention) and then test the key like so:
    my %a = ( 60622 => 1, 60516 => 1, 60201 => 1, ); ... if ($a{$id}) { print OUTPUT_A "$fn, $ln and $id"; } elsif ...
Re: A Love/Hate Relationship - A Long Time Newbie's Current Block
by bivouac (Beadle) on Nov 05, 2003 at 20:00 UTC

    Monks, thanks so much for the pick me ups, suggestions and the answers. I know I'll still continue to struggle, but I also know that if I write a concise question with code or pseudo-code that some Monk will be able to help. I was also a bit tickled that Randal replied to the node - Thanks for the link never knew about it even though it was right under my nose.

    I think I'm going to go with Jaap's solution because I really understand how that works. However, I do like Abigail's solution but I don't 60% of it and I don't think it's smart to use something you don't understand.

Re: A Love/Hate Relationship - A Long Time Newbie's Current Block
by Art_XIV (Hermit) on Nov 05, 2003 at 20:13 UTC

    Dude! You probably don't suck! I am a programmer by trade and I still manage to get hung up on things that seem like they should be obvious!

    When you do get seriously hung up on something, take a break. Your brain can then background its pattern_recognition.pl app while you browse fark, slashdot, or perlmonks. ;)

    Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"
Re: A Love/Hate Relationship - A Long Time Newbie's Current Block
by atcroft (Abbot) on Nov 05, 2003 at 17:41 UTC

    First of all, you're very close-the function you may wish to look at, though, is grep(), because you can do something like print OUTPUT_A join ( ',', $fn, $ln, $id ), "\n" if ( grep( /$id/, @a ) ); with it. I can't see a really good reason for changing to hashes for the IDs at the moment.

    Secondly, and more importantly, don't beat yourself up about this. You weren't that far off-you just needed a function you hadn't encountered yet. Yes, it can be flustrating sometimes, but never give up. Perhaps you may want to spend some time once in a while browsing here or the function listing. It sounds like you have a reasonably good starting point with the resources you currently have-just browse through them at times, or take one of the examples and see what you can do with it, or try something and if it doesn't work, feel free to ask here or in the CB.

      First of all, you're very close-the function you may wish to look at, though, is grep(), because you can do something like print OUTPUT_A join ( ',', $fn, $ln, $id ), "\n" if ( grep( /$id/, @a ) ); with it. I can't see a really good reason for changing to hashes for the IDs at the moment.

      Quite the opposite. grep performs really, really poorly if you want to do repeated searches (and that's what is happening here). On the other hand, building a hash takes a bit of overhead, but searches are really, really fast.

      I can't see any reason to use grep in this case.

      Abigail