in reply to Re: Checking LInes in Text File
in thread Checking LInes in Text File

Grandfather, I have a further question related to your suggestion. If I have hundres of lines of data like this:

tag1:xxxxxxx tag2:xxxxxx tag3:xxxxxxx tag4:yyyyy

How can I remove all lines that have the same tags 1 through 3 and replace it with a single line that has a new tag4? Currently I am able to remove all excess lines with the same tags 1 through 3 using your method but am unable to change tag4 because the hash method works by not writing subsequent values. Hence once I find out I have a duplicate it is too late to change it as the first has already been written. Any suggestions? Thanks!

Replies are listed 'Best First'.
Re^3: Checking LInes in Text File
by GrandFather (Saint) on Jun 01, 2006 at 23:13 UTC

    Provide half a dozen lines of sample data, the test code you are currently using, and a sample of the output you expect to see.

    For the test code it is easiest to use a __DATA__ section for the test data rather than an external file and simply print the result rather than generating an external file.


    DWIM is Perl's answer to Gödel
      Thanks GrandFather!

      My data would looks like this:
      MCAT: 0xf30cbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0
      MCAT: 0xcc3fbed1 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1
      MCAT: 0xeeccbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1
      MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA0
      MCAT: 0xeeecbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0
      MCAT: 0xcc000331 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1
      MCAT: 0xe554be01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1
      MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA1
      so every set of data (MCAT, PCAT, LMAT) is unique except for the fourth and eight which differ only by TYPE.

      I would like to roll the fourth and eight together so it would look like this:

      MCAT: 0xf30cbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0
      MCAT: 0xcc3fbed1 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1
      MCAT: 0xeeccbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1
      MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA0, KA1
      MCAT: 0xeeecbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0
      MCAT: 0xcc000331 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1
      MCAT: 0xe554be01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1

      This leaves me with a total of 7 data groups instead of 8.

      So far my code doesn't even come close to accomplishing doing this....

        The following is ok for reasonable size files but may bog down when things get huge.

        use strict; use warnings; use Data::Dump::Streamer; my %firstLines; my @lines; while (<DATA>) { chomp; my ($data, $type) = /(.*)\s+TYPE:\s+(\w+)$/; next if ! defined $type; # ignore malformed line if (exists $firstLines{$data}) { $lines[$firstLines{$data}] .= ", $type"; } else { $firstLines{$data} = @lines; push @lines, $_; } } print join "\n", @lines; __DATA__ MCAT: 0xf30cbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0 MCAT: 0xcc3fbed1 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1 MCAT: 0xeeccbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1 MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA0 MCAT: 0xeeecbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0 MCAT: 0xcc000331 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1 MCAT: 0xe554be01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1 MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA1

        Prints:

        MCAT: 0xf30cbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0 MCAT: 0xcc3fbed1 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1 MCAT: 0xeeccbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1 MCAT: 0xf30cbe91 PCAT: 0xafaddd09 LMAT: 0x00040000 TYPE: KA0, KA1 MCAT: 0xeeecbe01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA0 MCAT: 0xcc000331 PCAT: 0x000fb109 LMAT: 0x00000800 TYPE: KA1 MCAT: 0xe554be01 PCAT: 0xcda2b409 LMAT: 0x00100000 TYPE: KA1

        DWIM is Perl's answer to Gödel