in reply to •Re: Regex - unordered lookaround syntax
in thread Regex - unordered lookaround syntax

"Split is wrong when it's easier to talk about what you want to keep rather than what you want to throw away." --merlyn

I had always understood when I needed split, and when I needed match. But my brain kept these two concepts completely separate for a while. Then I had one of those eureka moments when something explained the magical syntax split // to split the string into solo characters. What a weird special-case, I had thought before. Now it seems so logical.

It makes sense that if s//-/g would insert dashes between each character, and m//g would happily return an array of nothings for each character, that split // should return the array of each character between all those nothings.

--
[ e d @ h a l l e y . c c ]

Replies are listed 'Best First'.
Re: Re: •Re: Regex - unordered lookaround syntax
by Anonymous Monk on Apr 29, 2003 at 04:56 UTC
    But have you ever stopped to wonder why you don't get an infinite loop at the first empty space? (This is explained in perlre, but virtually nobody understands the explanation.)

      When you have a /g (or the /g-like looping implied by split()), the matcher tries again and again to find more matches. To avoid such infinite loops, the engine advances pos() by one character if the pattern didn't already define any advancement. This goes for empty patterns like // as well as some combination of zero-width assertions like /(?=f\w\w)(?!foo)/. The /gc variety can control this somewhat.

      The "Mastering Regular Expressions" book likes to call this the 'bump along' effect: the 'transmission' bumps along the text until the pattern can succeed, or proven to fail.

      --
      [ e d @ h a l l e y . c c ]