I think that your explanation is closer than mine, but you're not all the way there yet.

print "'$_'" for split '([&/+-])|\s+', '129-129A & B-131 NORTH AV'; '129' ## Matches the first '-', produces '129' '-' ## and the captured delimiter '129A' ## Match the first space, return '129A' Use of uninit... ## and an undef for the empty capture '' ## and a nullstring? '' ## Match the '&', produces another null string +? '&' ## and the captured delimiter '' Use of uninit... ## Match the seecomd space, produce an undef '' ## and a null string? 'B' ## Match the second '-', produce the 'B' '-' ## And the captured delimiter '131' ## Match the 3rd space, produce '131' Use of uninit... ## and undef for the empty capture '' ## and a null string? 'NORTH' ## Match the fourth space, produce 'NORTH' Use of uninit... ## and undef for the empty capture '' ## and a null string for luck? 'AV' ## And the tail of the string.

So try throwing away any whitespace around a captured match and it gets better, but still not all the way:

print "'$_'" for split '\s*([&/+-])\s*|\s+', '129-129A & B-131 NORTH A +V'; '129' ## Match the first '-', produce '129' '-' ## and the captured delimiter '129A' ## Match ' & ', produce '129A' '&' ## and the captured delimiter 'B' ## Match the second '-', produce 'B' '-' ## and the captured delimiter '131' ## Match the first space, produce '131' Use of uninit... ## and undef for the empty delimiter '' ## and a nullstring for luck? 'NORTH' ## Match the second space, produde 'NORTH' Use of uninit... ## and undef for the empty capture '' ## and a nullstring for luck? 'AV' ## And the tail of the string.

Which leads me to conclude that split is roughly equivalent to

@bits = ( $string =~ m[(.*?)(?:PATTERN)]g, $' );

Vis

print "'$_'" for '129-129A & B-131 NORTH AV' =~ m[(.*?)(?:\s*([&/+-])\s*|\s+)]g +, $'; '129' '-' '129A' '&' 'B' '-' '131' Use of uninitialized value in ... '' 'NORTH' Use of uninitialized value in ... '' 'AV'

Which matches the output from split above exactly.

But even that does not explain where/why the nullstrings are coming from?

I think that there are at least two bugs here. The split docs could definitely be bolstered for the captured delimiters case, but also, the mysterious null string captures displayed by the regex above ought be fixed. Once that is fixed (if it can be) then the capturing delimiters case would be easier to explain I think.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

In reply to Re^5: split and capture some of the separators by BrowserUk
in thread split and capture some of the separators by shemp

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.