sanahmed has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I am trying to concatenate all the files in a directory.

The file format is the following:

file 1:

>A

Amber, Amy

>B

Barbie, Bambie

file 2:

>A

Austin, Aurora

>C

Cathy, Candy

file 3:

>B

Bob, Barbara

>C

Cane, Carter

I want to concatenate the files so that the final file reads:

>A

Amber, Amy, Austin, Aurora, -, -

>B

Barbie, Bambie, -, -, Bob, Barbara

>C

-, -, Cathy, Candy, Cane, Carter.

I tried to use hash tables but I am finding it very hard. Is there an easy way to do this?

Thanks!

Replies are listed 'Best First'.
Re: hash of hashes?
by ww (Archbishop) on Jul 08, 2011 at 23:18 UTC
Re: hash of hashes?
by muba (Priest) on Jul 09, 2011 at 02:17 UTC

    This seems to do what you want (although your specifications are a bit vague, at least, so I followed your description as close to the letter as I could). Note that I wrote this up while tired (worked over 9 hours during a hectic night at the restaurant) and inebriated (had a couple of drinks afterwards) so it can't be that hard :) Then again, I admit this may not be the most elegant way of going about it, but it does what you want and it should get you on the right track. Even if it's an XY problem.

    use strict; use warnings; my $file1 = << "EOF"; # I'm too lazy to create the actual files... >A Amber, Amy >B Barbie, Bambie EOF my $file2 = << "EOF"; # So I'll just define them as heredocs hre... >A Austin, Aurora >C Cathy, Candy EOF my $file3 = << "EOF"; # and read from them in the next part of the c +ode. >B Bob, Barbara >C Cane, Carter EOF my @files = (\$file1, \$file2, \$file3); # References to the scalar +s above. my %names = (); # Oh, you know... a hash. for my $f (@files) { # Iterate through the refe +rences open my $fh, "<", $f; # "Open" them for reading. my $key = ""; # We don't have a key yet. my %tmp_names = (); # Just a temporary hash... $tmp_names{$_} = ["-","-"] for qw(A B C); # ... for the blanks. while (<$fh>) { # Read lines. chomp; # I don't care about newli +nes. if (m/^>([ABC])$/) { # Is it ">" + A, B, or C? $key = $1; # Then it's a key for the +hash. next; } elsif ($key # Do we have a key yet? && $_) { # And it isn't a blank lin +e? my @names = split(/, /, $_); # Then it's a list of name +s! $tmp_names{$key} = \@names; # Store them in the tempor +ary hash } } close $fh; for (qw(A B C)) { push @{$names{$_}}, @{$tmp_names{$_}}; # Move tmp hash to real + hash. } } for (qw(A B C)) { print ">$_\n\n"; print join(", ", @{$names{$_}}), "\n\n"; }
Re: hash of hashes?
by txixco (Novice) on Jul 09, 2011 at 04:25 UTC

    I don't know if this is a better or worse solution, but I'm showing you mine (filenames are only an example, chage them if you have to):

    #!/usr/bin/perl use strict; use warnings; # Constants my $BLANK_LINE = qr/^\s+$/; my $KEY = qr/>([A-Z]{1})/; # Variables my %content; # Take the data for (<*.db>) { open(FILEDB, $_) or die "Cannot open file $_: $!\n"; my $key; for (<FILEDB>) { if (/$BLANK_LINE/) { next; } elsif (/$KEY/) { $key = $1; if (not $content{$key}) { $content{$key} = (); } } else { chomp; push (@{$content{$key}}, $_); } } close(FILEDB); } # Write the new file open(FILEDB, ">result.db") or die "Cannot open file result.db: $!\n"; for my $key (sort keys %content) { print FILEDB ">$key\n\n" . join(", ", @{$content{$key}}) . "\n\n"; } close(FILEDB);

      Two comments to this, though.

      The {1} quantifier in my $KEY = qr/>([A-Z]{1})/; doesn't serve any purpose.

      But more importantly, your solution doesn't offer the dashes for those cases where no records were found for a given letter - something which is part of the original specification.

        doesn't serve any purpose.

        Think of it as a comment, like

        my $Count = scalar @sausages;
        scalar isn't required to force scalar context, since the context is already scalar , as opposed to list context where it is required
        my($Count) = scalar @sausages;
        , but it helps the programmer remember
        The {1} quantifier in my $KEY = qr/>(A-Z{1})/; doesn't serve any purpose.

        Mmmm... Yes, I know. I promise I had a good reason to do this, but I cannot remember it right now. Maybe I misunderstood for a moment the greedy effect, I was tired :-).

        ...your solution doesn't offer the dashes for those cases where no records were found for a given letter

        I thought the dashes were a way to specify there could be more data not specified in the example, like ellipsis. Moreover, I saw a similar effect with spaces and I "fixed" it; it's not hard to add them, if they're needed.