Despite the title of this reply i'm still interested in any more guidance on how to do this properly. Since jeffa asked, this problem involves giving framechat the ability to translate the chatterbox in real time using babelfish.

I couldn't figure out how to implement an HTML::TreeBuilder solution so i went with DamnDirtyApe's code because it works and resembles the way i had tried and failed to solve the problem. His example works as posted but had problems when tested in the wild. So it evolved into the following working example. Two problems: i had to use two global variables, and the links break sometimes when the translation mixes up the order of the html placeholders. Testing with the live chatterbox today revealed that the second problem is relatively rare, but i'm still thinking about how to prevent it.

This test script translates a small sentence of English into a random European language while preserving the HTML:

#!/usr/bin/perl -w use strict; use WWW::Babelfish; use CGI 'header'; use Data::Dumper; my %html = (); my $C = 1; my @langs = qw(German French Spanish Italian Portuguese); shuffle(\@langs); my $str = qq~This <b>contains</b> both text and <a href="http://www.w3c.org"> +html</a>.~; $str = translate($str); print "$langs[0]: ", $str, '<pre>', Dumper(\%html); sub translate { my ($txt) = @_; $txt =~ s|<([^>]+)>|savetags($1)|eg; # weak parsing :-/ my $obj = new WWW::Babelfish( 'agent' => 'DeBabelizer' ); return $_[0] unless defined($obj); my $ttxt = $obj->translate( 'source' => 'English', 'destination' => $langs[0], 'text' => $txt ); return $_[0] unless defined($ttxt); $ttxt = encode($ttxt); # replace placeholders with corresponding html # need ;? cause the fish sometimes screws up that colon $ttxt =~ s|\&lt\;(\d+)\&gt\;?|<$html{$1}>|g; return $ttxt } sub savetags { # replace html tag content with placeholders my $htm = pop; $html{$C} = $htm; # ack, global $_ = '&lt;'.$C.'&gt;'; $C++; # global return $_ } ## and two third-party subs that add to the fun sub shuffle { # Perl Cookbook recipe 4.17 my $array = shift; for(my$i = @$array; --$i;){ my$j = int rand ($i+1); next if $i == $j; @$array[$i,$j] = @$array[$j,$i] } } sub encode { # UTF-8 to latin1 regex from XML::TiePYX (thanks to mirod) my($text) = @_; $text =~ s{([\xc0-\xc3])(.)}{ my $hi = ord($1); my $lo = ord($2); chr((($hi & 0x03) <<6) | ($lo & 0x3F)) }ge; return $text; }
Thanks to everyone who replied to my original query!

--
Check out my Perlmonks Related Scripts like framechat, reputer, and xNN.


In reply to Temporarily strip HTML (solved!) by epoptai
in thread Temporarily strip HTML by epoptai

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.