in reply to Re^2: transforming html
in thread transforming html

I'm not sure why you'd want this, as in the end it renders pretty much the same, but I suppose you have your reasons.

After digging through the documentation a bit, I finally found that as_HTML() is defined on HTML::Element and those docs don't really hint at a way to prevent that encoding from happening.

But we don't let them discourage us that easily, do we? So after diving into the source of HTML::Element and having a look at the code of the as_HTML subroutine, I learned that the entities encoding is handled by HTML::Entities.

sub as_HTML { # Bla bla bla # Your typical subroutine initial stuff we don't care much about.. +. if ( ... ) { # Some condition I don't really understand since I didn't bother t +o # understand the initial stuff above. But it didn't seem to releva +nt. # A whole lot of stuff happens here, seemingly all dealing with ta +gs, # not with text. else { # it's a text segment # Hey! Cool. # One more line of bla bla bla, before...: HTML::Entities::encode_entities( $node, $entities +) # Yeah, this sounds about right. Let's look at that. # More stuff I didn't bother to look at... }

Ok, so HTML::Entities is our target now. There's no apparent way to disable entity encoding so we'll have to use the source as our documentation again. *Shrug*, whatever, it's way past bedtime anyway now so I might as well see what I can do.

# HTML::Entities # First there's a whole lot of POD here, but since I already saw the H +TML # version of that (which wasn't very helpful) I don't really care. # Hey, cool. The actual module begins here. use strict; use vars qw(@ISA @EXPORT @EXPORT_OK $VERSION); use vars qw(%entity2char %char2entity); # Bla bla bla. Oh, wait, that last line looks promising. # Some more stuff for Exporter happens next. I don't care. %entity2char = ( # What follows is a long, long, long mapping of character names # to actual characters. # This list goes on and on and on... Never knew there were so many! ); # Then, suddenly: # Make the opposite mapping while (my($entity, $char) = each(%entity2char)) { $entity =~ s/;\z//; $char2entity{$char} = "&$entity;"; } delete $char2entity{"'"}; # only one-way decoding

He, he, he. I think we win. Just one line should, theoretically, keep this whole mean machine from replacing your characters with the html entities. It's a bit of a shame, since the original authors of this module went through such a pain to first set up one mapping (which is really a handful of pages long) and then to revert that mapping, but well, they should've made entity-encoding optional in the first place. Just one line, I think (although again it's untested).

%HTML::Entity::char2entity = (); # Bye bye.

Addendum: for completeness' sake, you'd put this line somewhere before you begin printing. Something like this should do the trick.

%HTML::Entity::char2entity = (); # Bye bye. open my $fh, ">", "out.html" or die $!; print $fh "<html><body>" . join("\n", map { $_->as_HTML } ($tit, $sub, $aut, $art) +) . "</body></html>";

Replies are listed 'Best First'.
Re^4: transforming html
by morgon (Priest) on Sep 29, 2010 at 21:07 UTC
    Again, many thanks.

    And I really think the way you answered my questions is exceptional - not just providing a final answer but illustrating what you did to attack the problem (making me feel a little bit guilty I did not put that much effort in myself).

    Really useful.

      You're most welcome. The pleasure was entirely mine, so don't feel guilty. I had lots of fun trying to tackle your problem. I'm glad I was able to help.