I think that your explanation is closer than mine, but you're not all the way there yet.
print "'$_'" for split '([&/+-])|\s+', '129-129A & B-131 NORTH AV'; '129' ## Matches the first '-', produces '129' '-' ## and the captured delimiter '129A' ## Match the first space, return '129A' Use of uninit... ## and an undef for the empty capture '' ## and a nullstring? '' ## Match the '&', produces another null string +? '&' ## and the captured delimiter '' Use of uninit... ## Match the seecomd space, produce an undef '' ## and a null string? 'B' ## Match the second '-', produce the 'B' '-' ## And the captured delimiter '131' ## Match the 3rd space, produce '131' Use of uninit... ## and undef for the empty capture '' ## and a null string? 'NORTH' ## Match the fourth space, produce 'NORTH' Use of uninit... ## and undef for the empty capture '' ## and a null string for luck? 'AV' ## And the tail of the string.
So try throwing away any whitespace around a captured match and it gets better, but still not all the way:
print "'$_'" for split '\s*([&/+-])\s*|\s+', '129-129A & B-131 NORTH A +V'; '129' ## Match the first '-', produce '129' '-' ## and the captured delimiter '129A' ## Match ' & ', produce '129A' '&' ## and the captured delimiter 'B' ## Match the second '-', produce 'B' '-' ## and the captured delimiter '131' ## Match the first space, produce '131' Use of uninit... ## and undef for the empty delimiter '' ## and a nullstring for luck? 'NORTH' ## Match the second space, produde 'NORTH' Use of uninit... ## and undef for the empty capture '' ## and a nullstring for luck? 'AV' ## And the tail of the string.
Which leads me to conclude that split is roughly equivalent to
@bits = ( $string =~ m[(.*?)(?:PATTERN)]g, $' );
Vis
print "'$_'" for '129-129A & B-131 NORTH AV' =~ m[(.*?)(?:\s*([&/+-])\s*|\s+)]g +, $'; '129' '-' '129A' '&' 'B' '-' '131' Use of uninitialized value in ... '' 'NORTH' Use of uninitialized value in ... '' 'AV'
Which matches the output from split above exactly.
But even that does not explain where/why the nullstrings are coming from?
I think that there are at least two bugs here. The split docs could definitely be bolstered for the captured delimiters case, but also, the mysterious null string captures displayed by the regex above ought be fixed. Once that is fixed (if it can be) then the capturing delimiters case would be easier to explain I think.
In reply to Re^5: split and capture some of the separators
by BrowserUk
in thread split and capture some of the separators
by shemp
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |