in reply to Re: Is there a faster / more efficient / quicker or easier way to do this ?
in thread Stripping a-href tags from an HTML document

I like anyone who likes that module ;)

I'd use next if $token->is_tag('a'); instead, but you really wanna combine your snippets, something like

use HTML::TokeParser::Simple; use strict; for(@ARGV){ my $p = HTML::TokeParser::Simple->new($_); my $hrefCount = 0; print "STRIPPING A-HREF TAGS in '$_'. PLEASE WAIT..\n"; open(TEMPO,">$_.tempo) or die "coudln't create $_.tempo($!)"; while(defined( my $t = $p->get_token() )){ if( $t->is_start_tag('a') ){ my $attr = $t->return_attr; if(exists $attr->{href}) { $hrefCount++; print "\nHREF TAG-->[$hrefCount]-->", $t->return_attr->{href},"\n\n"; } next; } elsif( $t->is_end_tag('a') ) { next; } else { print TEMPO $t->as_is; } } close(TEMPO); rename "$_.tempo", $_ or warn "couldn't rename '$_.tempo' to '$_'" +; }


MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
** The Third rule of perl club is a statement of fact: pod is sexy.

  • Comment on Re: Re: Is there a faster / more efficient / quicker or easier way to do this ?
  • Download Code

Replies are listed 'Best First'.
Re^3: Is there a faster / more efficient / quicker or easier way to do this?
by Aristotle (Chancellor) on Jan 12, 2003 at 00:12 UTC
    In trying to rewrite it to satisfy my sense of Fewer Indentation Levels Are Better, I rephrased the loop like this:
    while(defined(my $t = $p->get_token())){ print(TEMPO $t->as_is), next unless $t->is_tag('a'); my $attr = $t->return_attr; print( "\nHREF TAG-->[", ++$hrefCount, "]-->", $attr->{href}, "\n\n" ) if exists $attr->{href}; }

    Doing so it occured to me it will discard A NAME too - and fixing that is not entirely trivial as you need to keep track of whether the start tag was dropped or kept when you come across a closing /A.

    Update: this should work. Untested, but you get the idea.

    my @stack; while(defined(my $t = $p->get_token())){ if($t->is_start_tag('a')) { my $attr = $t->return_attr; push @stack, exists $attr->{href}; print( "\nHREF TAG-->[", ++$hrefCount, "]-->", $attr->{href}, "\n\n" ), next if $stack[-1]; } next if $t->is_end_tag('a') and pop @stack; print TEMPO $t->as_is; }

    Makeshifts last the longest.