This is a module I wrote for an online forum, a few users had started inputting long lines of "unbroken text". This resulted in page-widening.

Irritating.

My solution is the module HTML::BreakupText - not really intended to release it since it is kinda trivial. But shared here in case theres a use for it, or any interesting feedback.

=head1 NAME HTML::BreakupText - Perl extension for adding whitespace to HTML text. =head1 SYNOPSIS =for example begin #!/usr/bin/perl -w use HTML::BreakupText; use strict; + # my $html = q[<a href="http://foooooooooooooooo.com/">http://fooooo +ooooooooooooooooooooooooo.com</a>]; my $formatter = HTML::BreakupText->new( width => 10 ); my $output = $formatter->BreakupText( $html ); =for example end =head1 DESCRIPTION If you wish to display user supplied HTML text you may well find yours +elf a victim of people submitting long, unbroken, strings of input. This results in so-called "page widening". This module is designed to prevent this from occurring by breaking up supplied content into space deliminated output. The module is clever enough to not modify HTML attribute values - only their text componant +s. =cut package HTML::BreakupText; use vars qw($VERSION $DEFAULT_WIDTH @ISA @EXPORT @EXPORT_OK); require Exporter; require AutoLoader; @ISA = qw(Exporter AutoLoader); @EXPORT = qw( BreakupText ); ($VERSION) = '$Revision: 1.2 $' =~ m/Revision:\s*(\S+)/; ($DEFAULT_WIDTH) = 60; use HTML::TokeParser; =head2 new Create a new instance of this object. =cut sub new { my ( $self, %supplied ) = (@_); my $class=ref($self) || $self; # the options hash my $options = {}; $self->{options} = $options; # Set default width $options{width} = $DEFAULT_WIDTH; # # Allow user supplied values to override our defaults # foreach my $key ( keys %supplied ) { $options{ lc $key } = $supplied{ $key }; } return bless {}, $class; } =head2 BreakupText Process the given text and optional hash of options. Return the modified text; =cut sub BreakupText { my ($class, $str ) = ( @_ ); # # Get the user supplied split-width. # my $options = $self->{options}; my $width = $options{width}; my $tp = HTML::TokeParser->new(\$str) or die "Couldn't parse $str: $!"; $tp->unbroken_text(1); my ($html, $start); while (my $tag = $tp->get_token) { if ($tag->[0] eq 'T') { # # Here is where we breakup # my $t = $tag->[1]; $t =~ s/(\S{$width})/$1 /g; $html .= $t; } else { $html .= $tag->[4] if $tag->[0] eq 'S'; $html .= $tag->[1] if $tag->[0] eq 'C'; $html .= $tag->[2] if $tag->[0] eq 'E'; } } return( $html ); } 1; =head1 AUTHOR Steve Kemp http://www.steve.org.uk/ =head1 LICENSE Copyright (c) 2005 by Steve Kemp. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The LICENSE file contains the full text of the license. =cut
Steve
--

Replies are listed 'Best First'.
Re: Breakup user supplied text, whilst correctly ignoring HTML tags.
by merlyn (Sage) on Aug 24, 2005 at 15:31 UTC

      Wow, that's quite a knee-jerk reaction there.

      Much more important, IMO, is all of the manipulations of $self followed by return {}, $class; (throwing the modified $self away) or that $obj->new() modifies $obj or that 'use strict' only appears in the POD and some serious errors would be caught by its use.

      Steve, I'm guessing that the code you were using was a bit different than this (because I don't believe this code will set $width to anything but undef and so won't have much effect). Perhaps you wanted to 'dress it up' more like a complete module before posting it?

      The heart of the code looks good and useful. Thanks for posting it. And don't feel too bad about a few mistakes; we all make lots of mistakes.

      If you have any questions about the problems I outlined above, please reply and ask them.

      - tye        

        Thanks for the comment - which I only just noticed since it wasn't an immediate reply.

        You are correct the setup of the default width was broken. I didn't realise because I've always called it with an explicit width in my code, now I have a small test case collection which caught it.

        I'm not sure what more I can do to "dress it up" as you suggest - the core of the module is very simple, and I think it does all it needs to now. Still I'd certainly be interested in any concrete suggestions. As things stand it is working nicely upon my website and processing all the comment texts nicely ..

        Steve
        --