in reply to How can I delete characters between < and > in Perl?

If you want to delete matches spanning multiple lines, just delete the rest of the line if there's an unmatched < sign with an s substitution, and check the return value of that substitution to see if it's happened. If it has, set a flag and keep throwing lines away until you find one with a > sign, where you delete the part up to that sign and then continue applying the ordinary replacements.

Note however that if you are attempting to strip tags from a html or xml file, you'd better use a proper module instead of regexen written by hand. These will work better with more unusual html constructs and also malformed but usual html like one with unescaped angle brackets. Eg. try something like

perl -we 'use 5.010; use XML::Twig; binmode STDOUT, "encoding(iso8859- +2)"; $twig = XML::Twig->new->parsefile_html($ARGV[0]); say $twig->roo +t->text;' somefile.html

Replies are listed 'Best First'.
Re^2: How can I delete characters between < and > in Perl?
by Anonymous Monk on Apr 19, 2009 at 02:13 UTC