Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: HTML Tag Remover

by lolindrath (Scribe)
on Aug 06, 2000 at 20:12 UTC ( [id://26438]=note: print w/replies, xml ) Need Help??


in reply to HTML Tag Remover

This is how I did it without a module, I think it will work for what you need to do.
#!/usr/bin/perl -w open FILE, "c:\\html\\vb\\index.html" || die "can't open file"; @text = <FILE>; $text = join( "", @text ); close FILE; #print $text; $text =~ s/(\<(.*?)\>)//sg; print $text;

I tried this on several of my html files, you need the s option at the end of the replace funtion so that it will remove multi-line tags like comments in javascript.

--=Lolindrath=--

Replies are listed 'Best First'.
RE: Re: HTML Tag Remover
by nardo (Friar) on Aug 06, 2000 at 22:50 UTC
    That wouldn't work for html such as
    <img src="whatever.gif" alt=">>>Click Here<<<">
      Ok, I added this line before the other regex and it seemed to work, though it is a little specific to that problem. it simple removes anything that has more than one pointy bracket after it. If you want to keep these in you can always replace it with some character and replace it with the pointy brackets after its done with the html tag stripping. This is the revised code
      #!/usr/bin/perl -w open FILE, "c:\\html\\test.html" || die "can't open file"; @text = <FILE>; $text = join( "", @text ); close FILE; #print $text; $text =~ s/>[>+]//g; # < -- Added this line $text =~ s/\<(.*?)\>//sg; print $text;


      --=Lolindrath=--

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://26438]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-04-24 18:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found