mel1rose has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to read a textfile that is broken up into several profiles. Each profile is separated by a blank line. It looks something like this:
pos = 004 2574 2633 2571 2566 0088,1491,2365 0088,1514,2134 0088,1531,2100 0088,1568,2081 0088,1631,2070 0088,1716,2013 0088,1818,1954 0088,1921,1917 0088,2019,1889 0088,2108,1859 0088,2193,1881 0088,2266,1907 0088,2339,1843 0088,2398,1876 0088,2453,1822 0088,2488,1849 AB: 01000050 T%: 0456 INR: 018017 ID: 00004115 pos = 007 2470 2505 2484 2573 0088,1468,2406 0088,1473,2275 0088,1474,2143 0088,1482,2083 0088,1489,2009 0088,1510,1924 0088,1535,1965 0088,1573,1958 0088,1615,1935 0089,1667,1935 0088,1718,1935 0089,1770,1936 0088,1822,1943 0088,1869,1954 0089,1915,1965 0089,1961,1935 0089,1999,1932 0089,2034,1966 0089,2068,1942 0089,2098,1959 0089,2127,1976 0089,2152,1945 AB: 01000050 T%: 0725 INR: 018034 ID: 00004115
Sometimes these textfiles contatin several copies of the same profile. So I want to collapse this data into an array where inputting the same key erases the original value and replaces it with the current value so there's only one unique profile for each key. Here's the code that I've written thus far, but it isn't working at all.
open(IN, "testB.txt") or die "Can't open testB.txt"; open(OUT, ">>testB8.txt")or die "Can't open the file for writing.\n"; %Array = (); #Create an empty array $/ = "" ; #Enable paragraph mode $* = 1; #Enable multi-line strings #Read each paragraph in as a string ($Profile). Split the #first line of the string so that the first line is the #$Key and the rest of the $Profile is the $Value. Then #input these into the %Array. while (<IN>) { $_ = $Profile; ($Key, $Value) = split(/\r/, $Profile, 2); %Array = ("$Key" => "$Value"); } #For each $Key of the %Array, first sort the keys, then #print their associated values in the textfile OUT. foreach $Key (sort(keys(%Array))) { print (OUT "$Array{$Key} \n"); }
Any and all help is MUCH appreciated!

Replies are listed 'Best First'.
Re: Arrays and regular expressions
by Sifmole (Chaplain) on Apr 12, 2001 at 23:19 UTC
    Just real quick I notice this...
    while (<IN>) { $_ = $Profile; ($Key, $Value) = split(/\r/, $Profile, 2); %Array = ("$Key" => "$Value"); }

    The file is being read into $_ and you are then setting $_ = $Profile which I don't see defined or set any where. You then split $Profile which as far as I can tell is empty.

    As always "use strict;" Later, Sifmole

Re: Arrays and regular expressions
by reyjrar (Hermit) on Apr 12, 2001 at 23:38 UTC
    given what you've said I'd try something like this:
    #!/usr/bin/perl use strict; open IN, 'file'; open OUT, ">outfile"; my %HASH = (); my @RUNNING = (); my $INSIDE = ""; # keep track of where we are while(local $_ = <IN>) { chomp; if(/^\s*$/) { # we have an empty line $HASH{$INSIDE} = @RUNNING if $INSIDE; @RUNNING = (); # empty running now that its c +opied $INSIDE = ""; # no longer inside anything next; # keep going } elsif(/^\s*(pos\s*\=\s*\d+)$/i) { # this should be the "header" $INSIDE = $1; next; } elsif($INSIDE) { push @RUNNING, $_; } } close IN; for(sort keys %HASH) { print OUT "$_\n", map { $_ .= "\n"; } @{$HASH{$_}}; } close OUT;
    *shrugs* it passed strict and warnings.. but is untested.. also, if you're opening a BIG textfile.. IE megs large.. you might run into problems.. it'd get a tad more complex to fix that duplicate records problem with a large file but look at seek(), tell() and keeping records of where you are at the begining of each "id" in the OUT file.. have fun..

    -brad..
Re: Arrays and regular expressions
by ton (Friar) on Apr 13, 2001 at 00:35 UTC
    Here's how I would do it, assuming that the 'pos' value in the first line is the key:
    use strict; my %hash; my $entry; my $line; my $key; open(INFILE, "data") || die; while ($line = <INFILE>) { chomp($line); if ($line) { # in an entry if (!$entry) { # this is the first line $entry .= $line . "\n"; ($key = $line) =~ s/\D//g; } else { $entry .= $line . "\n"; } } else { # left the entry $hash{$key} = $entry; $entry = ""; } } close(INFILE); open(OUTFILE, ">datax") || die; foreach $key (sort(keys(%hash))) { print OUTFILE $hash{$key} . "\n"; } close(OUTFILE);
    Hope this helps...

    -Ton

    -----

    Be bloody, bold, and resolute; laugh to scorn
    The power of man...

      Thanks everyone! It's working now! ::mel1rose::