in reply to Regular Expressions Challenge

here a regex solution for splitting headers in multiline strings:

$slurp.=$_ while <DATA>; @array=split/^=+([\w\s]+)=+$/ms, $slurp; use Data::Dumper; print Dumper \@array; __DATA__ {{{your data}}}

you can easily extend it to split successively at different header levels. (take care that the first element is always the text preceding the first header)

when processing large texts you should consider using the flip-flop operator with parsing per line instead of splitting whole texts.

Cheers Rolf

Replies are listed 'Best First'.
Re^2: Regular Expressions Challenge
by cdarke (Prior) on May 18, 2010 at 11:30 UTC
    So here is an example using the flip-flop operator - however I am not certain what the OP is actually looking for:
    use strict; use warnings; use Data::Dumper; my @sections; while (<DATA>) { chomp; if (/===Comments===/../^[^=]/) { if (/===Comments===/) { push @sections,[] } else { push @{$sections[-1]}, $_ if $_ } } } print Dumper(\@sections);
    Which for the supplied data gives:
    $VAR1 = [ [ 'User comments are added here. A user may write whatever t +hey may wish.' ], [ 'Comments are related to the microarray data here.' ], [ 'Comments related to the pathway information here.' ] ];
Re^2: Regular Expressions Challenge
by LanX (Saint) on May 18, 2010 at 11:08 UTC
    generalized solution to parse level headers:
    $slurp.=$_ while <DATA>; @data=split/^(=+)([\w\s]+)\1$/m, $slurp; unshift @data,'','<Filename>'; while ( ($level,$header,$text,@data) = @data ) { print " " x length($level),$header,"\n"; # print $text,"\n\n"; } __DATA__
    OUTPUT
    <Filename> Title of Page Literature Comments Microarray Data Comments Pathway Information Comments Aditional Info

    Cheers Rolf