vladb has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to get this string:
Foo bar <tag>#var#as</tag> rules.
and split it into this:
Foo bar <tag> #var# as </tag> rules

I've tried a few approaches using the split() command; however, here's what I've been able to produce with this:
perl -MData::Dumper -e '@t= split(/(?=<)|(?=#[^#]+#)/,"Foo bar <cfoutp +ut>#var#as</cfoutput>"); print Dumper(\@t);'

output:
$VAR1 = [ 'Foo bar ', '<cfoutput>', '#var#as', '</cfoutput>' ];

which is slightly different from what i wanted to achieve. For example, I don't want o have '#var#as' but '#var#' and 'as' as separate elements instead.

any idea how I could do this?

"There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith

Replies are listed 'Best First'.
Re: help: 'Complex' split
by merlyn (Sage) on Dec 26, 2001 at 22:03 UTC
    When it's easier to talk about what you want to keep than what you want to throw away, then it's time to use m//g instead of split. I'd start with something like:
    my @pieces = m/\G(<[^>]*>|#[^#]*#|[^#<]+)/g;
    No error checking here. If you wanted error checking, inch along until pos() is at the end of the string, or take the error if it's not and there's no match. Something like:
    pos = 0; while (m/\G(<[^>]*>|#[^#]*#|[^#<]+)/gc) { push @pieces, $1; } die unless pos >= length;
    It would be nice if list-context m//gc left pos at the last match, but last I checked it was still broken.

    -- Randal L. Schwartz, Perl hacker

      Excellent!! it works for me.. ;-). thank you so very much.

      you notice vladb departing to his cubicle overjoyed...

      "There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith