in reply to Re: Truncate Data from MySQL
in thread Truncate Data from MySQL

/\s/ should be /\s+/ unless the empty string between two spaces counts as a word.

Replies are listed 'Best First'.
Re^3: Truncate Data from MySQL
by ww (Archbishop) on Jul 07, 2009 at 22:19 UTC
    That certainly is the right way to go... and cheap at the price. ++!

    Some minor quibbles though:

    1. OP offers no indication of actually having double spaces between sentences but that is a not uncommon occurance, which is why your observation is so valuable: Put two spaces rather than one in "...field. This..." in my __DATA__ and my split pattern does NOT DWIM) whereas yours does.
    2. The sample I used, from the OP, has no doubled spaces.
    3. Whether or not the db's text field has doubled spaces depends on how it was created. If it was simply scraped from a webpage, odds are that it has none, since browsers (and I believe, browser-substitutes) do not render but one in any string of literal whitespaces (character entities are, of course, a differnt matter).

    For some reason, your "...unless the empty string between two spaces counts as a word." does not parse to anything plausible (possible blind spot?) for me. FMI, is there a way to persuade split to treat the empty string between two spaces as a word boundary (\b) or a not_word boundary (\B)?

    Update: Oversight addendum: "the empty string between two spaces" is a position (despite cf perldoc -f split at "As a special case for "split", using the empty pattern "//"....")

      "The empty string between two spaces" is a funny wording. All I mean is that between any two neighbouring chars, you can say there is any number of zero-length strings ($a = '1'; $b = '2'; $empty = ''; $c = "$a$empty$b"; then $c eq "$a$b" and $c eq "$a$empty$b" and $c eq "$a$empty$empty$b" ...). I am aware that when using perl to extract zero length character sequences using split or regular expressions, it returns undef.