Re: RegEx Doubt

TIMTOWTDI - why not use split ...

$ perl -e '@a = split /<\/?xyz>/, q(<xyz>a</xyz><xyz>a3</xyz><xyz>a2</
+xyz><xyz>a1</xyz>); print qq/@a\n/'
 a  a3  a2  a1
[download]

Update:

Ahhh, maybe I see one reason, using Data::Dumper to print the output gives:

$ perl -MData::Dumper -e '@a = split /\<\/?xyz\>/, q(<xyz>a</xyz><xyz>
+a3</xyz><xyz>a2</xyz><xyz>a1</xyz>); print Dumper \@a'
$VAR1 = [
          '',
          'a',
          '',
          'a3',
          '',
          'a2',
          '',
          'a1'
        ];
[download]

Question, for me at least is: why doesn't split swallow the sub-strings on which the string is split ? I'm obviously missing something, but can't see it - any enlightenment appreciated.

TIA

A user level that continues to overstate my experience :-))

Comment on Re: RegEx Doubt Select or Download Code

Replies are listed 'Best First'.
Re^2: RegEx Doubt by jakobi (Pilgrim) on Sep 30, 2009 at 12:22 UTC
Hm, its non-zero-width, so it's still nice and easy: You've multiple 'split-points' in sequence in the source. Try either grep /./ on the split results or use split /(?:...)+/ to 'combine' them into one 'split-point'. Aren't they cute, those little regexes? Remembering apocalypse5 fondly :).	[reply]
Re^3: RegEx Doubt by Bloodnok (Vicar) on Sep 30, 2009 at 12:28 UTC
...use split /(?:...)+/ to 'combine' them into one 'split-point'. and then grep for empty lines i.e. `grep /./, ...`, since the first element is still empty, so might as well use `grep /./, ...` on the lot to start with. I tried the zero capture approach, but a) transposed the '?' and the ';' and b) didn't use '+' ... doh !!! Update: To reduce any confusion, the transposition to which I referred in the above was entirely down to the paucity of my typing i.e. I typed ':?' instead of '?:' and didn't notice .oO(Maybe I ought to use a larger font...) ;-) A user level that continues to overstate my experience :-))	[reply] [d/l] [select]
Re^2: RegEx Doubt by grizzley (Chaplain) on Oct 01, 2009 at 07:39 UTC
It puzzled me for a minute, but I found explanation. You have two sub-strings every time. Separated by nothing. `</xyz>_nothing_<xyz>`. And this nothing is what you find in your output.	[reply] [d/l]
Re^2: RegEx Doubt by dsheroh (Monsignor) on Oct 01, 2009 at 07:41 UTC
It does swallow the substrings on which the string is split. I don't see any `<\/?xyz>`s in the Dumper output. The blank entries being returned are the zero-length substrings in the middle of `</xyz><xyz>` - that combination is two matches of the split pattern with nothing in between.	[reply] [d/l] [select]