de2425 has asked for the wisdom of the Perl Monks concerning the following question:

I'm sorry to bother everyone again but you've all been so helpful. I hope you don't mind my asking for help again.

I am having some difficulty thinking of a way to do something that I'm sure is simple. I, however, just cannot get my brain around a way to get Perl to do it. What I have is a file with multiple records. The records contain an ID, name, description, application in that order. There are in some cases several of the same ID listed because one ID may have more than one application. What I want to do is print out the same data to anohter file but I want the ID's listed only once and all of the applications listed in the one line. So basically the data looks like this:

ID, name, description, application

1, XYZ, desc, PDF

1, XYZ, desc, QFZ

2, YGH, desc, LMN

What I'm looking for is:

1, XYZ, desc, PDF,QFZ

2, YGH, desc, LMN

The code I have been working on is pasted below. I am getting results but I am getting duplicate values in the last column and cannot figure out how to get them to print different values for one ID on the same line.

#!/usr/bin/perl -w open (IN, "c:/work/Abnova_Grouping/Abnova_App_and_Key.txt"); while(<IN>){ chomp; @t=split(/\t/,$_); $AB{$t[0]}.= ",$t[2]"; $AB{$t[0]}=$t[2].",$t[2]"; } close IN; open (OUT, ">c:/work/Abnova_Grouping/Abnova_Products.txt"); open (IN, "c:/work/Abnova_Grouping/Abnova_Product_List.txt"); while (<IN>){ chomp; @Products=split(/\t/,$_); if ($Products[0]=~/\d/ and exists $AB{$Products[0]}){ print OUT "$Products[0]\t$Products[2]\t$AB{$Products[0]}\n" +; } } close IN; close OUT;

I have been trying to think of a good way to do this for a while and haven't come up with anything. If anyone could help, I'd very much appreciate it.

P.S. I do know what I've been told about using the "my" before the various scalars, arrays, hashes, etc. The person that I'm writting these for doesn't like when I use those. That is why I don't. Please don't think me beligerant for not using them.

Thank you all once again for helping to bail me out of a spot. I finally got it going and I appreciate all of your help. Here is the code that I finally got to generate the output that I needed:

#!/usr/bin/perl -w open (IN, "c:/work/Abnova_Grouping/Abnova_App_and_Key.txt"); while(<IN>){ chomp; @t=split(/\t/,$_); $AB{$t[0]}=$AB{$t[0]}.",$t[2]"; } print %AB; close IN; close OUT; open (OUT, ">c:/work/Abnova_Grouping/Abnova_Products.txt"); open (IN, "c:/work/Abnova_Grouping/Abnova_Product_List.txt"); while (<IN>){ chomp; @Products=split(/\t/,$_); if ($Products[0] and exists $AB{$Products[0]}){ print OUT "$Products[0]\t$Products[2]\t$AB{$Products[0]}\n" +; } } close IN; close OUT;

Thank you all again so much!! I only hope to get to the point where I can be of some assistance to you. :-)

Replies are listed 'Best First'.
Re: Hashes with Multiple Keys and Combining Them
by johngg (Canon) on Sep 10, 2008 at 20:16 UTC
    If you use the ID, name and description parts, including the last comma, as the key to a hash entry and make the value an anonymous array onto which you push the application then you can print the key and then the joined elements quite simply.

    use strict; use warnings; open my $inFH, q{<}, \ <<'EOD' or die qq{open: << HEREDOC: $!\n}; 1,XYZ,desc,PDF 2,YGH,desc,KMN 1,XYZ,desc,QFZ EOD my %ids; while ( <$inFH> ) { chomp; my ( $key, $value ) = m{^(.*,)(.*)$}; push @{ $ids{ $key } }, $value; } print qq{$_@{ [ join q{,}, @{ $ids{ $_ } } ] }\n} for keys %ids; close $inFH or die qq{close: << HEREDOC: $!\n};

    The results.

    1,XYZ,desc,PDF,QFZ 2,YGH,desc,KMN

    Note that hashes are not ordered so if you want to impose a particular order in your output file you will need to sort the keys in some fashion. This exercise is left to the reader.

    I hope this is useful.

    Cheers,

    JohnGG

    Update: Corrected typo, s/nor/not/

Re: Hashes with Multiple Keys and Combining Them
by toolic (Bishop) on Sep 10, 2008 at 19:03 UTC
    You could read your data into a Hash-of-Hashes data structure:
    use strict; use warnings; my %AB; while (<DATA>) { chomp; my ($id, $name, $desc, $app) = split /\s*,\s*/; $AB{$id}{app} .= ",$app"; $AB{$id}{name_desc} = "$name, $desc"; } for (keys %AB) { $AB{$_}{app} =~ s/^,// } # remove leading comma for my $id (sort keys %AB) { print "$id, $AB{$id}{name_desc}, $AB{$id}{app} \n"; } __DATA__ 1, XYZ, desc, PDF 1, XYZ, desc, QFZ 2, YGH, desc, LMN

    Update: On second thought, this version is cleaner because it does not unnecessarily inject that comma:

    This prints:

    1, XYZ, desc, PDF,QFZ 2, YGH, desc, LMN
    I do know what I've been told about using the "my" before the various scalars, arrays, hashes, etc. The person that I'm writting these for doesn't like when I use those.
    Sorry, but my solution uses my and use strict. I refuse to code without them. If this person is your boss, look for another job; if this person is your spouse, get a divorce; otherwise, tell this person to become an educated Perl programmer :) (see Use strict and warnings)

      Thank you so much for your reply. The first one printed out some results but I don't seem to be getting it to group the ID's together. Maybe I did something wrong when I was importing the different code?? I'm not sure. I so much appreciate your looking at this for me. This is the code I just tried to use:

      #!/usr/bin/perl -w open (IN, "c:/work/Abnova_Grouping/Abnova_App_and_Key.txt"); open(OUT, ">c:/work/Abnova_Grouping/Abnova_App_and_Key_Sorted.txt"); my%AB; while(<IN>){ chomp; my($id, $prod_num, $app, $note)=split /\s^,\s^/; $AB{$id}{app} .= ",$app"; $AB{$id}{prod_num}="$prod_num, $note"; } for (keys %AB) { $AB{$_}{app}=~s/^,//} for my $id (sort keys %AB){ print OUT "$id, $AB{$id}{prod_num}, $AB{$id}{app} \n"; } close IN; close OUT;

      Thank you again for your help!!

Re: Hashes with Multiple Keys and Combining Them
by Tanktalus (Canon) on Sep 11, 2008 at 02:19 UTC

    I'm not sure what you're going to do with the output, though. Using the same delimiter for multiple items does make things more ... entertaining. What you have to start with is something you can probably feed into DBD::CSV and do queries against. What you're transforming it into will only be parsable by custom code (possibly based on Text::CSV_XS).

    If you want to reduce duplication yet still maintain some orthogonality, you could normalise your file into two files: one with ID/name/description, the second with ID and product. ID's would still be listed multiple times, but the name and description wouldn't be.

    This would also lend itself to feeding into a more full-featured data store (such as Sqlite, mySQL, postgreSQL, or even DB2 or Oracle) at a later date for more advanced handling. (See my latest database-oriented project. The output is basically pure SQL calls against nearly normalised data.)

Re: Hashes with Multiple Keys and Combining Them
by didess (Sexton) on Sep 10, 2008 at 21:10 UTC
    Hi, This code should satisfy your demand:
    Data are in a hash whose key is the ID and associated value is an array with the first elements being the name and the desc, and the next ones being the application-names)
    # ------------------------------------------------------------ %Struct = (); open (IN, "Abnova_Product_List.txt"); while(<IN>) { chomp($_); ($Key,$Name,$Desc,$OneApp) = split(/\t/,$_); if (! exists $Struct{$Key}) { $refArray = [$Name,$Desc,$OneApp]; $Struct{$Key} = $refArray; } else { $refArray = $Struct{$Key}; push (@$refArray,$OneApp); } } close(IN); open (OUT, ">Abnova_Products.txt"); foreach $Key (keys %Struct) { $refArray = $Struct{$Key}; print OUT "K=$Key\t".join("\t",@$refArray)."\n"; } close (OUT);
    There are no "my" are they yours ????
    Enjoy ;-)
Re: Hashes with Multiple Keys and Combining Them
by milarepa (Beadle) on Sep 10, 2008 at 18:46 UTC
    Hi,

    I tried your code, but I got the following error: Use of uninitialized value in concatenation (.) or string at 710425.pl line 8, <IN> line 1.

    It is possible to have some sample data file so I can try the code again.

    Thank you.