in reply to RegEx Doubt

TIMTOWTDI - why not use split ...
$ perl -e '@a = split /<\/?xyz>/, q(<xyz>a</xyz><xyz>a3</xyz><xyz>a2</ +xyz><xyz>a1</xyz>); print qq/@a\n/' a a3 a2 a1
Update:

Ahhh, maybe I see one reason, using Data::Dumper to print the output gives:

$ perl -MData::Dumper -e '@a = split /\<\/?xyz\>/, q(<xyz>a</xyz><xyz> +a3</xyz><xyz>a2</xyz><xyz>a1</xyz>); print Dumper \@a' $VAR1 = [ '', 'a', '', 'a3', '', 'a2', '', 'a1' ];
Question, for me at least is: why doesn't split swallow the sub-strings on which the string is split ? I'm obviously missing something, but can't see it - any enlightenment appreciated.

TIA

A user level that continues to overstate my experience :-))

Replies are listed 'Best First'.
Re^2: RegEx Doubt
by jakobi (Pilgrim) on Sep 30, 2009 at 12:22 UTC

    Hm, its non-zero-width, so it's still nice and easy: You've multiple 'split-points' in sequence in the source. Try either grep /./ on the split results or use split /(?:...)+/ to 'combine' them into one 'split-point'.

    Aren't they cute, those little regexes? Remembering apocalypse5 fondly :).

      ...use split /(?:...)+/ to 'combine' them into one 'split-point'. and then grep for empty lines i.e. grep /./, ..., since the first element is still empty, so might as well use grep /./, ... on the lot to start with.

      I tried the zero capture approach, but a) transposed the '?' and the ';' and b) didn't use '+' ... doh !!!

      Update:

      To reduce any confusion, the transposition to which I referred in the above was entirely down to the paucity of my typing i.e. I typed ':?' instead of '?:' and didn't notice .oO(Maybe I ought to use a larger font...) ;-)

      A user level that continues to overstate my experience :-))
Re^2: RegEx Doubt
by grizzley (Chaplain) on Oct 01, 2009 at 07:39 UTC
    It puzzled me for a minute, but I found explanation. You have two sub-strings every time. Separated by nothing. </xyz>_nothing_<xyz>. And this nothing is what you find in your output.
Re^2: RegEx Doubt
by dsheroh (Monsignor) on Oct 01, 2009 at 07:41 UTC
    It does swallow the substrings on which the string is split. I don't see any <\/?xyz>s in the Dumper output.

    The blank entries being returned are the zero-length substrings in the middle of </xyz><xyz> - that combination is two matches of the split pattern with nothing in between.