Breakup user supplied text, whilst correctly ignoring HTML tags.

This is a module I wrote for an online forum, a few users had started inputting long lines of "unbroken text". This resulted in page-widening.

Irritating.

My solution is the module HTML::BreakupText - not really intended to release it since it is kinda trivial. But shared here in case theres a use for it, or any interesting feedback.

=head1 NAME

HTML::BreakupText - Perl extension for adding whitespace to HTML text.

=head1 SYNOPSIS

=for example begin

    #!/usr/bin/perl -w
    use HTML::BreakupText;
    use strict;
                                                                      
+      #
    my $html = q[<a href="http://foooooooooooooooo.com/">http://fooooo
+ooooooooooooooooooooooooo.com</a>];

    my $formatter = HTML::BreakupText->new( width => 10 );
    my $output = $formatter->BreakupText( $html );

=for example end


=head1 DESCRIPTION

If you wish to display user supplied HTML text you may well find yours
+elf
a victim of people submitting long, unbroken, strings of input.

This results in so-called "page widening".

This module is designed to prevent this from occurring by breaking up
supplied content into space deliminated output.  The module is clever
enough to not modify HTML attribute values - only their text componant
+s.


=cut 


package HTML::BreakupText;

use vars qw($VERSION $DEFAULT_WIDTH @ISA @EXPORT @EXPORT_OK);

require Exporter;
require AutoLoader;

@ISA = qw(Exporter AutoLoader);
@EXPORT = qw( 
  BreakupText
     );

($VERSION)       = '$Revision: 1.2 $' =~ m/Revision:\s*(\S+)/;
($DEFAULT_WIDTH) = 60;

use HTML::TokeParser;


=head2 new

  Create a new instance of this object.

=cut
sub new
{
    my ( $self, %supplied ) = (@_);

    my $class=ref($self) || $self;

    # the options hash
    my $options = {};
    $self->{options} = $options;

    # Set default width
    $options{width} = $DEFAULT_WIDTH;

    #
    #  Allow user supplied values to override our defaults
    #
    foreach my $key ( keys %supplied )
    {
    $options{ lc $key } = $supplied{ $key };
    }
    return bless {}, $class;
}


=head2 BreakupText

  Process the given text and optional hash of options.

  Return the modified text;

=cut

sub BreakupText
{
    my ($class, $str ) = ( @_ );

    #
    # Get the user supplied split-width.
    #
    my $options = $self->{options};
    my $width   = $options{width};

    my $tp = HTML::TokeParser->new(\$str)
    or die "Couldn't parse $str: $!";

    $tp->unbroken_text(1);
  
    my ($html, $start);
    
    while (my $tag = $tp->get_token) 
    {
    
    if ($tag->[0] eq 'T')
    {
        #
        #  Here is where we breakup
        #
        my $t = $tag->[1];
        $t =~ s/(\S{$width})/$1 /g;
        $html .= $t;
    }
    else
    {
        $html .= $tag->[4] if $tag->[0] eq 'S';
        $html .= $tag->[1] if $tag->[0] eq 'C';
        $html .= $tag->[2] if $tag->[0] eq 'E';
    }
    }

    return( $html );
}


1;


=head1 AUTHOR

Steve Kemp

http://www.steve.org.uk/



=head1 LICENSE

Copyright (c) 2005 by Steve Kemp.  All rights reserved.

This module is free software;
you can redistribute it and/or modify it under
the same terms as Perl itself.
The LICENSE file contains the full text of the license.

=cut
[download]

Steve
--

Comment on Breakup user supplied text, whilst correctly ignoring HTML tags. Download Code

Replies are listed 'Best First'.
Re: Breakup user supplied text, whilst correctly ignoring HTML tags. by merlyn (Sage) on Aug 24, 2005 at 15:31 UTC
`sub new { my ( $self, %supplied ) = (@_); my $class=ref($self) \|\| $self; ...` [download] ref($proto) - just say no!, says the bad-meme-killer. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l]
Re^2: Breakup user supplied text, whilst correctly ignoring HTML tags. (mini meme) by tye (Sage) on Aug 25, 2005 at 05:59 UTC
Wow, that's quite a knee-jerk reaction there. Much more important, IMO, is all of the manipulations of $self followed by `return {}, $class;` (throwing the modified $self away) or that $obj->new() modifies $obj or that '`use strict`' only appears in the POD and some serious errors would be caught by its use. Steve, I'm guessing that the code you were using was a bit different than this (because I don't believe this code will set $width to anything but undef and so won't have much effect). Perhaps you wanted to 'dress it up' more like a complete module before posting it? The heart of the code looks good and useful. Thanks for posting it. And don't feel too bad about a few mistakes; we all make lots of mistakes. If you have any questions about the problems I outlined above, please reply and ask them. - tye	[reply] [d/l] [select]
Re^3: Breakup user supplied text, whilst correctly ignoring HTML tags. (mini meme) by skx (Parson) on Sep 18, 2005 at 14:45 UTC
Thanks for the comment - which I only just noticed since it wasn't an immediate reply. You are correct the setup of the default width was broken. I didn't realise because I've always called it with an explicit width in my code, now I have a small test case collection which caught it. I'm not sure what more I can do to "dress it up" as you suggest - the core of the module is very simple, and I think it does all it needs to now. Still I'd certainly be interested in any concrete suggestions. As things stand it is working nicely upon my website and processing all the comment texts nicely .. Steve --	[reply]
Re^3: Breakup user supplied text, whilst correctly ignoring HTML tags. (mini meme) by merlyn (Sage) on Aug 25, 2005 at 07:00 UTC
Wow, that's quite a knee-jerk reaction there. Just doing my job. Until the meme is dead, it's going to keep getting pointed out. If people would use Super Search here, I'd be quiet. But apparently, they don't. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]