comment on

Hi adrya407,

Personally I like to implement this kind of thing using a state machine type approach. Although it certainly takes more lines of code than a single regex, it doesn't require you to read the entire file into memory, and personally I find the conditions (especially complex ones) are more easily expressed in Perl conditionals than in regexes, and because of that I think it's more easily extensible - it looks like you've got some variant of INI file there, so I hope it's not too wild a thought that you may need to get more than just "my_variable" from the file in the future. Or maybe you later find you need to add support for skipping comment lines, etc. Anyway, this is just One Way To Do It. In this example I'm using the definedness of $myvar to keep state, in a more complex situation I'd use a separate state variable. ~~The repeated code (printing $myvar) could be refactored into an (anonymous) sub.~~

Update: The previous version of the code didn't do anything when it encountered a "[...]" line, so "my_variable" would continue to accumulate afterwards. I've updated the code to now cause "[...]" to end a "my_variable" definition and also refactored the code that handles a completed $myvar into an anonymous sub.

use warnings;
use strict;

my $myvar;
my $take = sub {
    return unless defined $myvar;
    chomp($myvar);
    print "<<$myvar>>\n";
    undef $myvar;
};
while (<DATA>) {
    if (my ($k,$v) = /^(\w+)=(.*)$/s) {
        $take->();
        $myvar = $v if $k eq 'my_variable';
    }
    elsif (/^\[.+\]$/) {
        $take->();
    }
    else {
        $myvar .= $_ if defined $myvar;
    }
}
$take->();

__DATA__
unwanted_line1=blabla
unwanted_line2=blabla
my_variable=important_content_section1
important_content_section2
important_content_section3
unwanted_line3=blabla
unwanted_line4=blabla
unwanted_line5=blabla
my_variable=important_content_section4
important_content_section5
important_content_section6
[stepxyz#xxxx]
unwanted_content1
unwanted_line6=blabla
my_variable=important_content_section7
unwanted_line7=blabla
my_variable=important_content_section8
my_variable=important_content_section9
unwanted_line8=blabla
[download]

Output:

<<important_content_section1
important_content_section2
important_content_section3>>
<<important_content_section4
important_content_section5
important_content_section6>>
<<important_content_section7>>
<<important_content_section8>>
<<important_content_section9>>
[download]

Hope this helps,
-- Hauke D

In reply to Re: Multiline regex (Updated!) by haukex
in thread Multiline regex by adrya407

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.