in reply to Why doesn't non-greediness work?

I love questions like this! (in a sadistic sort of way!) They allow me to investigate more alternatives to using regexes to parse *ML ... like my new favorite XML::Twig. You really need to invest quite a bit of time into these kinds of solutions, but the time is well invested as it simply improves your overall programming skills. Here is my take on the problem:
use strict; use warnings; use XML::Twig; my $twig = XML::Twig->new( twig_handlers => { 'img[@alt="Smiley"]' => sub { XML::Twig::Elt->new('#PCDATA',':)')->replace($_) }, 'img[@alt="Wink"]' => sub { XML::Twig::Elt->new('#PCDATA',';)')->replace($_) }, }, pretty_print => 'indented', ); $twig->parse(\*DATA); $twig->flush; __DATA__ <body> <a href="wink.html"> <img border="0" src="/images/wink.gif" alt="Wink"/> </a> <a href="smile.html"> <img border="0" src="/images/smiley.gif" alt="Smiley"/> </a> </body>
It works, but i had to 'XML-ize' the image tags first. I wrapped the img tags inside a tags simply to show that other tags are outputted 'as-is'. Also, a big ++ to broquaint for helping get this right. I was trying to create a new XML::Twig::Elt object with 'CDATA' as the first arg. This created a <CDATA> tag pair - broquaint changed that to '#CDATA', which led me to the correct argument ... '#PCDATA'. Confusing? Start studying! ;)

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

Replies are listed 'Best First'.
Re: (jeffa) Re: Why doens't non-greediness work?
by Tomte (Priest) on May 10, 2003 at 20:49 UTC

    ++jeffa for that one, and not only for requiring correct xml-syntax ;-)
    , although IMNSHO every new bit of html put to the net should be xhtml 1.\(0|1\)

    regards,
    tomte


    Hlade's Law:

    If you have a difficult task, give it to a lazy person --
    they will find an easier way to do it.

Re: (jeffa) Re: Why doens't non-greediness work?
by mirod (Canon) on May 11, 2003 at 05:51 UTC

    And now of course I have to add my onw take to it!

    All you want to do is change some img tags, while leaving the rest of the file unchanged. This looks like a good opportunity to use twig_roots, which only builds the twig for the elements that have handlers, and the awfully named twig_print_outside_roots, that prints everything else in the document:

    #!/usr/bin/perl -w use strict; use warnings; use XML::Twig; my $twig = XML::Twig->new( twig_print_outside_roots => 1, twig_roots => { 'img[@alt="Smiley"]' => sub { print q{:)} }, 'img[@alt="Wink"]' => sub { print q{;)} }, }, ); $twig->parse(\*DATA); __DATA__ <body> <a href="wink.html"><img border="0" src="/images/wink.gif" alt="Wink"/ +></a> <a href="wink.html"><img border="0" src="/images/wink.gif" alt="NotWin +k"/></a> <a href="smile.html"><img border="0" src="/images/smiley.gif" alt="Smi +ley"/></a> </body>
Re^2: Why doesn't non-greediness work? (HTML::TokeParser::Simple)
by Aristotle (Chancellor) on May 11, 2003 at 11:54 UTC
    For the parser types among us, there are likely more suitable options to be had though.
    use strict; use warnings; use HTML::TokeParser::Simple; my %xlat = ( Smiley => ':)', Wink => ';)', ); my $p = HTML::TokeParser::Simple->new( \*DATA ); while ( my $t = $p->get_token ) { if( $t->is_start_tag('img') and my $r = $xlat{$t->return_attr->{alt}} ) { print $r; } else { print $t->as_is; } } __END__ <body> <a href="wink.html"> <img border="0" src="/images/wink.gif" alt="Wink"> </a> <a href="smile.html"> <img border="0" src="/images/smiley.gif" alt="Smiley"> </a> </body>
    Note this doesn't require XHTML.

    Makeshifts last the longest.