comment on

I'm playing around with different methods of subclassing to get a feel for user interface issues and I have a question of "style".

Right now, I'm subclassing HTML::TokeParser as HTML::TokeParser::Easy and I thought it would be interesting to be able to do the following:

my $parser = HTML::TokeParser::Easy->new( $some_html );

while ( my $token = $parser->get_token ) {
    # This prints all text in an HTML doc (i.e., it strips the HTML)
    next if ! $token->is_text;
    print $token->return_text;
}
[download]

Unfortunately, the only way I can think to do that is by blessing the $token and turning it into an object. That seems like there would be a lot of overhead involved.

sub get_token {
    my $self = shift;
    my $class = ref $self;
    my $token = $self->SUPER::get_token;
    return undef if ! defined $token;
    bless $token, $class;
}
# create appropriate methods...
[download]

The other strategy I thought of was to allow the user to do the following:

while ( my $token = $parser->get_token ) {
    # This prints all text in an HTML doc (i.e., it strips the HTML)
    next if ! $parser->is_text( $token );
    print $parser->return_text( $token );
}
[download]

I used the following to do this (using AUTOLOAD to simplify things):

##################
package HTML::TokeParser::Easy;
##################
use strict;
use HTML::TokeParser;
use vars qw/ @ISA $VERSION $AUTOLOAD /;
$VERSION = '1.0';
@ISA = qw/ HTML::TokeParser /;

use constant START_TAG   => 'S';
use constant END_TAG     => 'E';
use constant TEXT        => 'T';
use constant COMMENT     => 'C';
use constant DECLARATION => 'D';

my %token_spec = (
    S => {
        _name   => 'START_TAG',
        tag     => 1,
        attr    => 2,
        attrseq => 3,
        text    => 4
    },
    E => {
        _name => 'END_TAG',
        tag   => 1,
        text  => 2
    },
    T => {
        _name => 'TEXT',
        text  => 1
    },
    C => {
        _name => 'COMMENT',
        text  => 1
    },
    D => {
        _name => 'DECLARATION',
        text  => 1
    } );

sub AUTOLOAD {
    no strict 'refs';
    my ($self, $token) = @_;

    # was it an is_... method?
    if ( $AUTOLOAD =~ /.*::is_(\w+)/ ) {
        my $token_type = uc $1;
        my $tag = &$token_type;
        *{ $AUTOLOAD } = sub { return $_[ 1 ]->[ 0 ] eq $tag ? 1 : 0 }
+;
        return &$AUTOLOAD;
    } elsif ( $AUTOLOAD =~ /.*::return_(\w+)/ ) {
       # was it a return_... method?
       my $token_attr = $1;
       *{ $AUTOLOAD } = 
           sub { 
               my $attr = $_[ 1 ]->[ 0 ];
               if ( exists $token_spec{ $attr }{ $token_attr } ) {
                   return $_[ 1 ]->[ $token_spec{ $attr }{ $token_attr
+ } ];
               } else {
                   warn "No such attribute: '$token_attr' for $token_s
+pec{ $attr }{ _name }";
               }
           };
        return &$AUTOLOAD;
    } else {
        # Yo!  You can't do that!
        die "No such method: $AUTOLOAD";
    }
}
[download]

Blessing the tokens makes the interface seem much more intuitive, but creating so many objects seems like it's going to be wasteful and slow. The second method works fine, but the interface seems a bit cumbersome. Is there anyway I can get the syntax of the first method without the overhead?

Cheers,
Ovid

Vote for paco!

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

In reply to Subclassing strategies by Ovid

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.