in reply to Re: Removing underline tags with regexp
in thread Removing underline tags with regexp
You're better off using one of the HTML parsing modules.
No you're not.
<rant>
This mantra is getting very old. In this particular instance the OP is asking about removing <u> and </u>, not about HTML parsing in the general case. Let's look at the issues in turn:
Wanting to strip out underline tags is therefore about as trivial as it gets. Plus in absence of evidence to the contrary, the person is in control of the HTML and has a pretty good idea of what's in there. In this case, there are no attribute values to worry about and the element name is only one character long. You probably don't even have to worry about the tag wrapping from one line to the next. It really doesn't get any easier than this.
An entire directory of files can be done with the following one-liner:
perl -i.bak -pe 's/<\/?u>//gi' *.html
Saying that one you needs a parser to do this is just spreading FUD and making things seem more complicated than they need to be. Do simple jobs simply, and keep it simple, stupid. Save the parser approach for something hard, like converting a font-marked-up page into CSS.
</rant>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
2Re: Removing underline tags with regexp (is a good idea)
by jeffa (Bishop) on Sep 02, 2003 at 13:05 UTC | |
by Elian (Parson) on Sep 02, 2003 at 13:23 UTC | |
by jeffa (Bishop) on Sep 02, 2003 at 13:46 UTC | |
by Elian (Parson) on Sep 02, 2003 at 14:00 UTC | |
|
Re: Re: Removing underline tags with regexp (is a good idea)
by antirice (Priest) on Sep 02, 2003 at 17:04 UTC | |
|
Re: Removing underline tags with regexp (is a good idea)
by Abigail-II (Bishop) on Sep 02, 2003 at 13:02 UTC |