Hi predrag,
It sounds like the trickiest part of your current solution is probably figuring out whether you're in some part of the HTML code or whether you're in the text, since obviously tags shouldn't be converted to Cyrillic. Unfortunately, parsing HTML is a pretty difficult task (a humorous post about the topic). So I'd like to encourage you to look at one of the parser modules again.
Two classic modules are HTML::Parser and HTML::TreeBuilder, but there are several others, such as Mojo::DOM. If the input is always XHTML, there's XML::Twig and many more XML-based modules. These modules generally break down the HTML into their structure, including elements (<tags>) with their attributes, comments, or text. Some of the modules then represent the HTML as a Document Object Model (DOM), which is also worth reading a little about. It sounds like you only want to operate on text, and maybe on some elements' attributes (such as title="..." attributes).
Operating only on text is relatively easy: for example, in a HTML::Parser solution, you could register a handler on the text event, which does the appropriate conversions, and register a default handler which just outputs everything else unchanged:
use warnings; use strict; use HTML::Parser; my $p = HTML::Parser->new( api_version => 3, unbroken_text => 1 ); $p->handler(text => sub { my ($text) = @_; # ### Your filter here ### $text=~s/foo/bar/g; print $text; }, 'text'); $p->handler(default => sub { print shift; }, 'text'); my $infile = '/tmp/in.html'; my $outfile = '/tmp/out.html'; open my $out, '>', $outfile or die "open $outfile: $!"; # "select" redirects the "print"s my $previous = select $out; $p->parse_file($infile); close $out; select $previous; print "$infile -> $outfile\n";
Operating on attributes will require you to handle opening elements (tags) as well. Note also that the same basic principle I described above applies to the other modules: they all break the HTML down into its components, so that you can operate on only the textual parts, leaving the others unchanged.
BTW, have you seen Lingua::Translit?
Hope this helps,
-- Hauke D
In reply to Re^3: Begginer's question: If loops one after the other. Is that code correct?
by haukex
in thread Begginer's question: If loops one after the other. Is that code correct?
by predrag
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |