Hi, I've been reading you quite a lot for answers about so many questions and always found what I wanted, until now.

I am trying to read two (very) long files in order to compare them in a smart way: checking if some elements (such as value=3.14 vs value="3.14") are swapped on the same line.

Also:

- there are a lot of lines that I will be willing to discard as soon as I read them. Therefore, I am trying not to store these in memory as each file can go way beyond 100 000 lines each.

- I might append one or more following line (starting with a +) to the previous line starting with a letter if: this first line doesn't match with the one in the other file, if one of the following isn't matching.

Lines can be such as:

ABC a b c value=3.14 + value2="2.04"

or

ABC a b c value=3.14 + value2="2.04" + value3=text

Right now, I am reading them in this very simple way:

while (defined(my $lineA = <FILE_A>) && defined(my $line_b = <FILE_B>) +) { ... compare_line(lineA=$lineA, lineB=$lineB); ... }

When running small test cases, it works really great (swap comparison etc.) However, I have some glitches and I guess that the longest file doesn't have its line read when the end of the shorter file is reached. These glitches are that one of the line starting with a + is the start of a new line in my result print (while it should always be appended after my first line).

I tried changing && to || but it got all messed up. I am thinking of dealing the remaining part of the longest file after the end of the shortest one is reached, however it doesn't sound really clean.

Looking forward reading your thoughts and suggestions!

-F

P.S: running Perl 5.8.8

In reply to Reading concurrently two files with different number of lines by frogsausage

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.