keiusui has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I run a guestbook online, and some guests enter extraordinarily long words such as URLs. This is fine, except when the guestbook entry is displayed in a browser, it runs off the page and disfigures the page design.

Given a paragraph $p, how would I insert a space into any word that was over 40 characters long?

I tried the following code:

$p =~ s/(\S{40})/$1\ /g;

Would this work in all cases?

Replies are listed 'Best First'.
Re: how to find and split long words
by GrandFather (Saint) on Jul 26, 2009 at 00:23 UTC

    Although your regex will work as intended, a URL is not a word and as such offers 'natural' places to insert a break (after a / for example). Depending on what you are actually expecting to be entered, there may be other rules that work better for other types of information. Inserting inappropriate spaces is rather akin to inserting inappropriate hyphens, you can end up with something that conveys a very different meaning than the original text. The following may do a somewhat better job:

    s!(\S{10,39} (?: [a-z](?=[A-Z]) | [\\/:,.;\!?\)\]] | (?= [\(\[{]) | (? +<=\S{40}) ))!$1 !gx;

    It breaks for a "CamelCase" change ("Camel Case"), after sentence and URL punctuation, before brackets, or after 40 characters if all else fails.


    True laziness is hard work
Re: how to find and split long words
by graff (Chancellor) on Jul 26, 2009 at 03:25 UTC
    And the point of showing a whole big long url is...?

    If the display involves providing some long url as a link (i.e. <a href=long.url>long.url</a>), you could just keep the full-length string as the href target, and replace as much of the middle as you want with "..." for the display string -- and if you decide to get fancy, use a mouse-over event to display the full-length string as a "tool-tip" style pop-up.

    (The point is, people seldom try to actually read a long url -- they usually just glance and click. The careful ones will wait for a pop-up of the full string if they know it will show up, and they can decide to "view source" if it doesn't.)

    Of course, if you are not providing an actual <a href=...> link for these critters, well, displaying the whole big string is useful in some sense, though whatever you do to break it up into a bunch of short lines will tend to interfere with any sort of "copy/paste" usage that reader may want to try. Any attempt to select the whole multi-line string will end up with some sort of white-space wherever you inserted a line-break, and this will need to be edited out when someone wants to use it as an actual url.

    And if people are sending you really long strings that are not urls, maybe there's some practical need for that, which your current web-page design is not supporting very well, and maybe you should rethink the page layout.

    As for the code you tried, I gather it worked (in terms of splitting long words) for the cases you happen to know about. It looks to me like it would work equally well in all cases, in the sense that: if the paragraph string contains any number of substrings with 40 or more non-whitespace characters in a row, a space will be added after the 40th character in the substring. (A string of 80 or more non-whitespace in a row will have two spaces added, and so on.) After applying that operation, the maximum-length non-whitespace string in your paragraph is sure to be 40 characters long.

    The other replies above might lead you to a solution that produces more coherent/readable results in a variety of cases, by breaking at strategic points.

Re: how to find and split long words
by ww (Archbishop) on Jul 26, 2009 at 03:13 UTC
    ...or (but you may believe this "disfigures the page design"):

    You can identify elements of excessive length (for whatever value of 'excessive' works for your ap) using length ([perldocdoc://length]), and then handle such cases with CSS2's "overflow" (CSS3 also offers "overflow-x).

    For CSS2, see http://www.w3schools.com/Css/pr_pos_overflow.asp which offers an example similar to this:

    div { width: 240px; max-width: 240px; // Belt and suspenders. Rarely needed height: 200px; overflow: scroll; }

    Div's with this styling will add vertical and horizontal scrollbars, perhaps calling visitors' attention to the run-on lines more effectively than the horizontal scrollbar at the base of the browser window.

    As a very rough rule of thumb, for ordinary fonts, your 40 chars spec translates to a width of about 240px (but you may need to reduce that, depending on your design, to allow for the vertical scrollbar width.

Re: how to find and split long words
by bichonfrise74 (Vicar) on Jul 26, 2009 at 01:20 UTC
Re: how to find and split long words
by moritz (Cardinal) on Jul 26, 2009 at 16:56 UTC
    I have to agree with graff that displaying only parts of the URL is a nicer solution; but of course the same problem still persists for non-URL long words.

    I've made good experience with inserting a U+200B ZERO WIDTH SPACE character instead of a normal blank; browsers use it word-wrap, but show no space if wrapping is not necessary. So you can have the best of both worlds ;-)