Split with data keep

Kage has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Split with data keep by fruiture (Curate) on Nov 30, 2002 at 11:39 UTC
I'm not sure whether I've understood your problem. Can it be that your problem is solved using backreferences in the split// regexp? `@stuff = split m{ <!--end--> <a \s name=" ( \d+ ) "></a> <!--start--> }x => $data;` [download] See `perldoc -f split` to see what happens with your backreference $1. Another option is to use a parsing while(REGEXP): `while( $data =~ m{ \G <a \s+ href = " (\d+) "></a> <!--start--> ( .+? ) <!--end--> }xg ){ my $number = $1; my $data = $2; #... }` [download] -- http://fruiture.de	[reply] [d/l] [select]
Re: Split with data keep by dws (Chancellor) on Nov 30, 2002 at 18:22 UTC
`split()` is powerful, but it isn't the only tool in the bag. If what you're after is either "Data..." or the named anchor, a better way to approach the problem might be to first isolate the text within the start and end tags, and then decide what to do with it. Assuming text is in $text, and scan span several lines, something like the following should do the trick: `while ( $text =~ m/<!--start-->(.*?)<--end-->/s ) { my $chunk = $1; if ( $chunk =~ /<a name="(.+?)"></a>/ ) { # do something with $1 } else { # do something with $chunk } }` [download]	[reply] [d/l] [select]
Re: Split with data keep by rir (Vicar) on Dec 01, 2002 at 06:12 UTC
It is not completely clear what you wish to extract. This will extract the variable parts. It assumes a name value may not contain a double-quote. If that is not correct match on the quote and following tag, like the second half of the regex. #!/usr/bin/perl use strict; use warnings; $_ = q\|<a name="a name"></a><!--start-->some stuff<!--end-->\| . q\|<a name="Mae B Arthur"></a><!--start-->various text<!--end-->\| . q\|<a name="36561357542"></a><!--start-->What kind of #'s that<!--end +-->\| . q\|<a name="aafq0w4tyu89[ "></a><!--start-->aeo;utrq[134[ a<!--end--> +\| ; while ( m\|<a name="([^"]+?)"></a>(?:<!--start-->(.*?)(<!--end-->)+?)\|s +gc ) { print "name: \|$1\|\n" . "art: \|$2\|\n\n"; } __DATA__ name: \|a name\| art: \|some stuff\| name: \|Mae B Arthur\| art: \|various text\| name: \|36561357542\| art: \|What kind of #'s that\| name: \|aafq0w4tyu89[ \| art: \|aeo;utrq[134[ a\| [download]	[reply] [d/l]
Re: Split with data keep by dbp (Pilgrim) on Dec 01, 2002 at 00:53 UTC
Given that you've read your file into a scalar: `my (@articles) = ($text =~ /(<a name.?<!--start-->.?<!--end-->)/gs);` [download] Of course this is doing two stingy matches in one pattern which is probably godawful slow. Update: Or use a hash instead `my (%hash) = ($text =~ /(<a name.?)(<!--start-->.?<!--end-->)/gs);` [download]	[reply] [d/l] [select]