Re: Truncate Data from MySQL

TIMTOWTDI (clumsier, but only slightly different):

#!/usr/bin/perl
use strict;
use warnings;
# 777834

my (@copy, $copy, $i);
@copy = split (/\s/, <DATA>, 16);

for (0..14) {
    print $copy[$_] . " ";
}

__DATA__
Pull out only the first 15 words from the pubText field. This is where
+ I need suggestions, the code below does not work.
[download]

Comment on Re: Truncate Data from MySQL Download Code

Replies are listed 'Best First'.
Re^2: Truncate Data from MySQL by mzedeler (Pilgrim) on Jul 07, 2009 at 18:51 UTC
`/\s/` should be `/\s+/` unless the empty string between two spaces counts as a word.	[reply] [d/l] [select]
Re^3: Truncate Data from MySQL by ww (Archbishop) on Jul 07, 2009 at 22:19 UTC
That certainly is the right way to go... and cheap at the price. ++! Some minor quibbles though: OP offers no indication of actually having double spaces between sentences but that is a not uncommon occurance, which is why your observation is so valuable: Put two spaces rather than one in "...field. This..." in my `__DATA__` and my `split` pattern does NOT DWIM) whereas yours does. The sample I used, from the OP, has no doubled spaces. Whether or not the db's text field has doubled spaces depends on how it was created. If it was simply scraped from a webpage, odds are that it has none, since browsers (and I believe, browser-substitutes) do not render but one in any string of literal whitespaces (character entities are, of course, a differnt matter). For some reason, your "`...unless the empty string between two spaces counts as a word.`" does not parse to anything plausible (possible blind spot?) for me. FMI, is there a way to persuade split to treat the empty string between two spaces as a word boundary (`\b`) or a not_word boundary (`\B`)? Update: Oversight addendum: "the empty string between two spaces" is a position (despite cf `perldoc -f split` at "As a special case for "split", using the empty pattern "//"....")	[reply] [d/l] [select]
Re^4: Truncate Data from MySQL by mzedeler (Pilgrim) on Jul 09, 2009 at 19:59 UTC
"The empty string between two spaces" is a funny wording. All I mean is that between any two neighbouring chars, you can say there is any number of zero-length strings (`$a = '1'; $b = '2'; $empty = ''; $c = "$a$empty$b";` then `$c eq "$a$b" and $c eq "$a$empty$b" and $c eq "$a$empty$empty$b"` ...). I am aware that when using perl to extract zero length character sequences using split or regular expressions, it returns undef.	[reply] [d/l] [select]