comment on

If your file is too large to slurp, and performance is an issue--when is it ever not :)-- you might consider this technique.

Demo

#! perl -slw
use strict;
$|++;                # Disable buffering on STDOUT for demo

my $re_multi = qr[    # Match 3 lines
    ^name\s*\n        # First starts with "name"  (and maybe some whit
+espace?
    ^-+\n            # Second consists entirely of '-'s
    (^.+)\n            # Third has the stuff we want so capture to $1
]mx;                # Allow match across line boundries.
                    # Ignore incidental white space

my $buffer = '';    # Init our buffer to null

# Grab a managable chunk of data.
# 16 is silly for demo only.
# My test show that 32/64 k seems about right on my system.
# The length() call ensures residual is retained

while (sysread( DATA, $buffer, 16, length $buffer)) {

    #     find all occurances in buffer.

    while($buffer =~ m[$re_multi]g ) {

        print $1;    # Do something with them

        # stop the buffer grwing bigger than necessary
        # by discarding everything upto the end of the last match

        $buffer = substr($buffer, pos($buffer) );
    }

    # This line defends against the buffer growing very large if two o
+ccurances
    # of the pattern are a very long way apart in the file.
    # It works (in the case) by discarding stuff that that cannot be p
+art
    # of what we are looking for (and could probably be improved upon)
+.
    # This must be tailored on a case-by-case basis.

    $buffer = substr( $buffer, pos($buffer) -1 ) if $buffer =~ m[\n(?!
+name)]gc;
}

__DATA__
Loads'a junk a garbage and irrelevent crap
Loads'a junk a garbage and irrelevent crap Loads'a junk a garbage and 
+irrelevent crap
name
----------------------------------------
1 23 4.5 678e9
Loads'a junk a garbage and irrelevent crap
Loads'a junk a garbage and irrelevent crap Loads'a junk a garbage and 
+irrelevent crap
Loads'a junk a garbage and irrelevent crap
Loads'a junk a garbage and irrelevent crap Loads'a junk a garbage and 
+irrelevent crap
name
----------------------------------------
1 23 4.5 678e9
Loads'a junk a garbage and irrelevent crap Loads'a junk a garbage and 
+irrelevent crap
Loads'a junk a garbage and irrelevent crap
Loads'a junk a garbage and irrelevent crap Loads'a junk a garbage and 
+irrelevent crap
name
----------------------------------------
1 23 4.5 678e9
Loads'a junk a garbage and irrelevent crap
Loads'a junk a garbage and irrelevent crap Loads'a junk a garbage and 
+irrelevent crap
[download]

Output

D:\Perl\test>258244
1 23 4.5 678e9
1 23 4.5 678e9
1 23 4.5 678e9
[download]

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

In reply to Re: RegEx on more than one line by BrowserUk
in thread RegEx on more than one line by jmaya

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.