nysus has asked for the wisdom of the Perl Monks concerning the following question:

OK, I've been at this problem for at least an hour and I am starting to get very frustrated. I've got a random HTML table in a string and I want to change the width dimension of the table to 300 pixels using the s/// operator. I require that it work with a table that fails to use quotes, and that the original dimension can be in pixels or percentage. Here's the closest I've come so far:
$rep_contact_info =~ s/(<\s*table.*?width\s*=\s*"*).*?("*.*?>)/{$1}300 +{$2}/si;
But it results in the following bad output:
{<table width="}300{620" border="0">}
Obviously, I'm having problems replacing the original dimension. I'm also having problems with those brackets showing up which I though were metacharacters. Another pair of more experienced eyeballs on this problem will be much appreciated. Thanks.

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop";
$nysus = $PM . $MCF;
Click here if you love Perl Monks

Replies are listed 'Best First'.
Re: Reg Ex Problem: Changing width dimension in HTML table
by arthas (Hermit) on May 20, 2003 at 15:57 UTC
    You should be able to do these kind of work easily with a module which parses HTML, such as HTML::TagReader. Here's a link to some examples on using it (the author of the article does some changes to font tags).

    Michele.
Re: Reg Ex Problem: Changing width dimension in HTML table
by Mr. Muskrat (Canon) on May 20, 2003 at 15:53 UTC
    The curly brackets should only be around the numbers (i.e.  ${1}300${2}) but that is not your only problem as you will see when you make that change (hint: you don't want to capture the old table width).
      OK, after a bit more tinkering, this seems to work:
      $rep_contact_info =~ s/(<\s*table.*?width\s*=\s*"*)\d*%*("*.*?)/${1}50 +0${2}/si;
      I'm sure it could be a lot prettier. Thanks for the bracket info.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop";
      $nysus = $PM . $MCF;
      Click here if you love Perl Monks

Re: Reg Ex Problem: Changing width dimension in HTML table
by Ovid (Cardinal) on May 20, 2003 at 17:33 UTC

    Sorry to be blunt, but if you can find the bug in the following regex, you can go ahead and parse HTML with regexes :)

    $data =~ s/(<a\s(?:[^>](?!href))*href\s*)(&#61;\s*(&[^;]+;)?(?:.(?!\3))+(?:\3)?)([^>]+>)/$1.decode_entities($2).$4/gsei;

    Otherwise, use a parser. It's more work to set up, but it's more likely to be correct.

    use HTML::TokeParser::Simple; my $html =<<'END_HTML'; <table width="200"> <tr> <td>foo</td> </tr> </table> END_HTML my $parser = HTML::TokeParser::Simple->new(\$html); my $new_html = ''; while (my $token = $parser->get_token) { $new_html .= $token->is_start_tag('table') ? new_width_attribute($token) : $token->as_is; } print $new_html; sub new_width_attribute { my $token = shift; my ( $attr, $attrseq ) = ($token->return_attr, $token->return_attr +seq); $attr->{ width } = '300'; my $tag = ''; foreach ( @$attrseq ) { $tag .= qq{$_="$attr->{$_}"}; } $tag = "<table $tag>"; return $tag; }

    (This is such a common request that I need to push this into the module).

    Cheers,
    Ovid

    New address of my CGI Course.
    Silence is Evil (feel free to copy and distribute widely - note copyright text)

      Hey, thanks. And your answer uncovers a very stubborn problem for stubborn new programmers like me who tend to do the quick and dirty because we haven't been burned enough to take the time to do it right. I'll look into the toke parser module.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop";
      $nysus = $PM . $MCF;
      Click here if you love Perl Monks

        You don't give yourself enough credit. It's not just new programmers who are stubborn. In Stubborn as a Saint, I pointed out how I tend to take the first answer and run with it. I've seen plenty of other programmers do the same thing. Stubborness, I think, is not a function of ability. Rather, it's a function of false pride. When someone is willing to let go of an unworkable position, that person becomes a better programmer (and possibly a better person, too -- I'm still working on that aspect).

        Cheers,
        Ovid

        New address of my CGI Course.
        Silence is Evil (feel free to copy and distribute widely - note copyright text)

Re: Reg Ex Problem: Changing width dimension in HTML table
by CombatSquirrel (Hermit) on May 20, 2003 at 16:22 UTC
    To the first problem: Use backreferences with explicit repetition delimiters for &quot;. My solution is as follows:
    $rep_contact_info = '<table width ="12">'; $rep_contact_info =~ s/(<\s*table.*?width\s*=\s*("{0,1})).*?(\2.*?>)/$ +{1}300${3}/si; print $rep_contact_info;
    The quotation mark (if any) is captured in $2 and backreferenced by \2 in the RegEx. The curly braces {0,1} take care that the quotation mark doesn't match too often
    To your second problem: Use the braces after the dollar sign, instead before it.