1. Read in a file into a scalar
2. Split scalar into an array using split(/\n{2}/, $scalar)

That's rather wasteful. You can probably do it all at once, if you're slick, iterating over each line in the file as you go. You can probably get #3, and #4 in there too. For #5 (the phone numbers), maybe keep a temporary hash outside the loop with the phone numbers in it, and as you go along, if it doesn't exist in the hash, add it, if it does, drop it. The biggest speed increase you could have, I would imagine, would be getting all of these points into a single loop over the file, and I believe it can be done.

It's a lot of code, and I don't want to embarass myself by coming up with something right now, but some ideas:

Iterate over each line of the file, of course. Keep a variable handy to store the current serial number (if any) and whether a record is open or not (in case there's any line noise between records and such, probably not important). Also, your records hash, and a hash of phone numbers that you're just going to get rid of in the end.

When you get a "SERIAL NUMBER (\d+)" line, put $1 in your current serial number value. When you get a { line, set open to true, and } set it to false. Anything else, while it's open, is stuff to shove into the hash. And, when you get a phone number, check so see if it already exists (in your phone number hash), and if it does, you can just delete() the current record out of the hash (or, do the check when you get a closing brace so if you have stuff after the phone number, it won't re-enter the record).

Oh, I forgot, when you get a SERIAL NUMBER thing, you can do the check there for repeating numbers and figure out a new one. It's not that important that you don't have all the records already, as if your new number is taken by a later record, that later record will be incremented too. However, if that's not the behavior you want, my entire suggestion goes out the window.

I hope you could follow. I could send you some actual code but it would take me some time to write up and test, so /msg me or something if you want some code.

local $_ = "0A72656B636148206C72655020726568746F6E41207473754A"; while(s/..$//) { print chr(hex($&)) }


In reply to RE: Efficiency and Large Arrays by reptile
in thread Efficiency and Large Arrays by Kozz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.