Re^5: Checking LInes in Text File

The following is ok for reasonable size files but may bog down when things get huge.

use strict;
use warnings;
use Data::Dump::Streamer;

my %firstLines;
my @lines;

while (<DATA>) {
    chomp;
    my ($data, $type) = /(.*)\s+TYPE:\s+(\w+)$/;
    
    next if ! defined $type; # ignore malformed line
    if (exists $firstLines{$data}) {
        $lines[$firstLines{$data}] .= ", $type";
    } else {
        $firstLines{$data} = @lines;
        push @lines, $_;
    }
}

print join "\n", @lines;

__DATA__
MCAT: 0xf30cbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0
MCAT: 0xcc3fbed1 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1
MCAT: 0xeeccbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1
MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA0
MCAT: 0xeeecbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0
MCAT: 0xcc000331 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1
MCAT: 0xe554be01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1
MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA1
[download]

Prints:

MCAT: 0xf30cbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0
MCAT: 0xcc3fbed1 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1
MCAT: 0xeeccbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1
MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA0, KA1
MCAT: 0xeeecbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0
MCAT: 0xcc000331 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1
MCAT: 0xe554be01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1
[download]

DWIM is Perl's answer to Gödel

Comment on Re^5: Checking LInes in Text File Select or Download Code

Replies are listed 'Best First'.
Re^6: Checking LInes in Text File by Anonymous Monk on Jun 02, 2006 at 16:52 UTC
Grandfather, Thank you very much!!! I really appreciate you solving that problem for me. I'm very new to PERL and have to admit that your code took a while to make sense to me. I didn't realize that you could access an array by the data value, I thought you had to access it by location (0,1,2,3... etc). Thanks again for your help!	[reply]
Re^7: Checking LInes in Text File by GrandFather (Saint) on Jun 02, 2006 at 23:38 UTC
Just in case there is some confusion or misunderstanding of some of the Perl tricks used I better go through some of that code and elaborate on what's happening. Note that I've taken interesting lines in processing order rather that the order they are coded. `$firstLines{$data} = @lines;` this is a little tricksy. It creates a new entry in `%firstLines` that contains the index to the new line as the value and is keyed by the unique part of the line contents. `@lines` in scalar context returns the number of elements in the array. `if (exists $firstLines{$data})` checks to see if we've already seen a specific line. `$lines[$firstLines{$data}] .= ", $type";` builds the multiple entries for duplicated lines. Note that `$firstLines{$data}` returns the index number that was stored earlier. DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re^8: Checking LInes in Text File by ibeneedinghelp (Initiate) on Jun 05, 2006 at 23:11 UTC
Thanks much for the added commentary! Now I think I actually understand what you did.	[reply]