G'day james28909,

"Anyway, let me start off by posting example code and files:"

For future reference, please post a short, representative sample of your data here. I tried to download the zip file you linked to, but

$ wget https://dl.dropboxusercontent.com/u/64707444/monks/monks.zip --2015-10-29 15:29:58-- https://dl.dropboxusercontent.com/u/64707444/ +monks/monks.zip Resolving dl.dropboxusercontent.com (dl.dropboxusercontent.com)... 199 +.47.217.101 Connecting to dl.dropboxusercontent.com (dl.dropboxusercontent.com)|19 +9.47.217.101|:443... connected. ERROR: The certificate of `dl.dropboxusercontent.com' is not trusted. ERROR: The certificate of `dl.dropboxusercontent.com' hasn't got a kno +wn issuer.

[Perhaps I could've tried harder to get this but I don't really have the time and I shouldn't have to, anyway.]

Here's some tips on the code you presented.

When opening files, always check for problems. Either use the autodie pragma or hand-craft messages (see open for examples).

Repeatedly opening files in a loop, and reading their entire contents multiple times, is rarely (if ever) a good idea. I see that you've done this in both a while and a for loop. Aim to open and read once. If you need to jump around in an opened file, consider seek and tell.

When you read "file1" (for the first time), it may be better to store the data in a hash. For example, instead of

push( @original, $rightside );

perhaps something closer to

++$original{$rightside};

You can then lose the "for (@original) {...}" loop altogether, and change

if ( $last =~ $_ ) {

to something like

if ($original{$last}) {

Also, your use of a regex match ($last =~ $_) seems questionable. I haven't delved too deeply into this, but a straight equality check ($last eq $_) looks like it might be a better idea.

These suggestions have been intentionally vague. Without any input and only erroneous expected output (you wrote: "EDIT: It seems there is indeed an error in the output"), I am somewhat loathe to attempt to suggest anything more concrete with regards to the actually processing.

If you do provide sample input and real expected output, myself (or another monk) might provide a better answer.

— Ken


In reply to Re: Getting data from second file, based on first files contents; by kcott
in thread UPDATED - Getting data from second file, based on first files contents; by james28909

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.