Re: Regular Expressions Challenge

here a regex solution for splitting headers in multiline strings:

$slurp.=$_ while <DATA>;
@array=split/^=+([\w\s]+)=+$/ms, $slurp;
 
use Data::Dumper;
print Dumper \@array;
 
__DATA__
{{{your data}}}
[download]

you can easily extend it to split successively at different header levels. (take care that the first element is always the text preceding the first header)

when processing large texts you should consider using the flip-flop operator with parsing per line instead of splitting whole texts.

Cheers Rolf

Comment on Re: Regular Expressions Challenge Download Code

Replies are listed 'Best First'.
Re^2: Regular Expressions Challenge by cdarke (Prior) on May 18, 2010 at 11:30 UTC
So here is an example using the flip-flop operator - however I am not certain what the OP is actually looking for: `use strict; use warnings; use Data::Dumper; my @sections; while (<DATA>) { chomp; if (/===Comments===/../^[^=]/) { if (/===Comments===/) { push @sections,[] } else { push @{$sections[-1]}, $_ if $_ } } } print Dumper(\@sections);` [download] Which for the supplied data gives: `$VAR1 = [ [ 'User comments are added here. A user may write whatever t +hey may wish.' ], [ 'Comments are related to the microarray data here.' ], [ 'Comments related to the pathway information here.' ] ];` [download]	[reply] [d/l] [select]
Re^2: Regular Expressions Challenge by LanX (Saint) on May 18, 2010 at 11:08 UTC
generalized solution to parse level headers: `$slurp.=$_ while <DATA>; @data=split/^(=+)([\w\s]+)\1$/m, $slurp; unshift @data,'','<Filename>'; while ( ($level,$header,$text,@data) = @data ) { print " " x length($level),$header,"\n"; # print $text,"\n\n"; } __DATA__` [download] OUTPUT `<Filename> Title of Page Literature Comments Microarray Data Comments Pathway Information Comments Aditional Info` [download] Cheers Rolf	[reply] [d/l] [select]