I hope that replies have been helpful so far. The common theme that connects the replies is that since you are dealing with address records, you should parse the input so that you have one record per name. This simplifies the search regex. And instead of doing one match for some huge string, you iterate over the records, applying the search regex(s) to each record.

Presumably the search result will be a complete record, or a partial record. Do the record separation on input rather than in each regex search.

The records could be stored as an array of stings (simple @record_as_string) where each element is one string representing the whole record. Or an Array of Hash (AoH) - that's what TJPride did (or close) instead of the print, just: push @AoH, \%data;

I think its appropriate to mention than in addition to re-defining the input record separator to be "Name:", you can also set it($/) to undef. If you do that then the entire file can be "slurped" into one variable without doing all the concatenate stuff. But I don't think that is what you need.

my $all_data; { local $/ = undef; #no separator means whole file $all_data = <DATA>; } # now $/ is back to what it was before # that is what the local within a lexical scope did
I also really doubt that you are going to run into a memory problem. A 10MB file in the format that you have would be no problem at all - and would have a LOT of addresses! There are ways to solve any "memory problem", but I don't think that memory is even close to being an issue according to your description. If the data set is a 100MB file, the we probably ought to talk more.

If you are familiar with 'C', the Perl Array of Hash, is very similar to the 'C' Array of Structure. Lots of streets are named after people. Geez, how many "Martin Luther King" boulevards are there? If you go this way, it will be easier to "fine tune" your search regex'es to the data that is relevant.

There are two great parsing techniques. Both are fine.

If you need help on searching the array structure (however you choose to do it), ask again.


In reply to Re: proper way of matching multiple line patterns by Marshall
in thread proper way of matching multiple line patterns by perlperlperl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.