in reply to Re^3: Checking LInes in Text File
in thread Checking LInes in Text File

Thanks GrandFather!

My data would looks like this:
MCAT: 0xf30cbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0
MCAT: 0xcc3fbed1 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1
MCAT: 0xeeccbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1
MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA0
MCAT: 0xeeecbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0
MCAT: 0xcc000331 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1
MCAT: 0xe554be01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1
MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA1
so every set of data (MCAT, PCAT, LMAT) is unique except for the fourth and eight which differ only by TYPE.

I would like to roll the fourth and eight together so it would look like this:

MCAT: 0xf30cbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0
MCAT: 0xcc3fbed1 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1
MCAT: 0xeeccbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1
MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA0, KA1
MCAT: 0xeeecbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0
MCAT: 0xcc000331 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1
MCAT: 0xe554be01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1

This leaves me with a total of 7 data groups instead of 8.

So far my code doesn't even come close to accomplishing doing this....

Replies are listed 'Best First'.
Re^5: Checking LInes in Text File
by GrandFather (Saint) on Jun 02, 2006 at 00:38 UTC

    The following is ok for reasonable size files but may bog down when things get huge.

    use strict; use warnings; use Data::Dump::Streamer; my %firstLines; my @lines; while (<DATA>) { chomp; my ($data, $type) = /(.*)\s+TYPE:\s+(\w+)$/; next if ! defined $type; # ignore malformed line if (exists $firstLines{$data}) { $lines[$firstLines{$data}] .= ", $type"; } else { $firstLines{$data} = @lines; push @lines, $_; } } print join "\n", @lines; __DATA__ MCAT: 0xf30cbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0 MCAT: 0xcc3fbed1 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1 MCAT: 0xeeccbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1 MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA0 MCAT: 0xeeecbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0 MCAT: 0xcc000331 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1 MCAT: 0xe554be01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1 MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA1

    Prints:

    MCAT: 0xf30cbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0 MCAT: 0xcc3fbed1 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1 MCAT: 0xeeccbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1 MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA0, KA1 MCAT: 0xeeecbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0 MCAT: 0xcc000331 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1 MCAT: 0xe554be01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1

    DWIM is Perl's answer to Gödel
      Grandfather,

      Thank you very much!!! I really appreciate you solving that problem for me. I'm very new to PERL and have to admit that your code took a while to make sense to me. I didn't realize that you could access an array by the data value, I thought you had to access it by location (0,1,2,3... etc). Thanks again for your help!

        Just in case there is some confusion or misunderstanding of some of the Perl tricks used I better go through some of that code and elaborate on what's happening. Note that I've taken interesting lines in processing order rather that the order they are coded.

        $firstLines{$data} = @lines;

        this is a little tricksy. It creates a new entry in %firstLines that contains the index to the new line as the value and is keyed by the unique part of the line contents. @lines in scalar context returns the number of elements in the array.

        if (exists $firstLines{$data})

        checks to see if we've already seen a specific line.

        $lines[$firstLines{$data}] .= ", $type";

        builds the multiple entries for duplicated lines. Note that $firstLines{$data} returns the index number that was stored earlier.


        DWIM is Perl's answer to Gödel