jaspreet_sethi5875 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Perl module HTML::Strip is returning blank values after parsing whereas valid values are expected. We are using perl 5.8.8 on CentOS release 5.7 (Final). In our script we are fetching around 25k records in a single go from DATABASE and parsing them with HTML::Strip and in the process around 4-6 records are becoming blank after parsing whereas in DATABASE they have a defined value(containing english phrases along with html tags).

Below is the part of the code:
script starts ... ... ... While(25k to 50k records from DB) { my $field = ''; $field = $h->{'HTML_FIELD'} if(defined($h->{'HTML_FIELD'})); $field = &html_parsing($field) if($field); } sub html_parsing { my $raw_html = shift; my $hs = HTML::Strip->new(); my $string = $hs->parse( $raw_html ); ## Problem: here the $string is becoming blank after parsing. $hs->eof; return $string; } ... ... ... script ends

When we again check only for those 4-6 blank records through a sample script then after html parsing their values is as expected from DATABASE(not blank). Any help will be really appreciable.

Thanks
Jaspreet Sethi

Replies are listed 'Best First'.
Re: HTML::Strip returning blank value after parsing
by Anonymous Monk on Jan 16, 2013 at 10:50 UTC

    Below is the part of the code:

    That won't do :) that code doesn't compile

    $html_field = &html_parsing($html_field) if($html_field'); ^Z String found where operator expected at - line 1, at end of line (Missing semicolon on previous line?) Can't find string terminator "'" anywhere before EOF at - line 1. $

    You'll need to provide runnable code with dummy html data that shows/demonstrates all the values being stripped instead of some being preserved (How do I post a question effectively?).

      Hi,

      Aplology for the formatting while posting the problem, i have again updated it now, also the code is fully compiled.

      Below is the part of the code:
      script starts ... ... ... While(25k to 50k records from DB) { my $field = ''; $field = $h->{'HTML_FIELD'} if(defined($h->{'HTML_FIELD'})); $field = &html_parsing($field) if($field); } sub html_parsing { my $raw_html = shift; my $hs = HTML::Strip->new(); my $string = $hs->parse( $raw_html ); ## Problem: here the $string is becoming blank after parsing $hs->eof; return $string; } ... ... ... script ends
      Also, while processing the DATABASE records, below is the one of the dummy html data(from DATABASE) which became blank after html parsing:
      $field = qq(<p>With a solid oak top and white painted legs, the New En +gland Side Table compliments any interior design scheme. This stylish + side table comprises a magazine shelf and fluted detailing to the le +gs.</p> <p>The perfect partner to our New England Coffee Table.</p> < +p><br />H43cm L51cm W41cm</p>);


      whereas if i'll check only this record through a sample script then after html parsing it value is as expected from DATABASE(not blank)

      Thanks
      Jaspreet Sethi

        As you can see here, $field is reduced in length (some tags are eliminated), so $string clearly isn't blank. original length 294, new length 267

        #!/usr/bin/perl -- use strict; use warnings; use Data::Dump qw' dd '; use HTML::Strip; my $field = qq(<p>With a solid oak top and white painted legs, the New + England Side Table compliments any interior design scheme. This styl +ish side table comprises a magazine shelf and fluted detailing to the + legs.</p> <p>The perfect partner to our New England Coffee Table.</p +> <p><br />H43cm L51cm W41cm</p>); dd length $field; $field = html_parsing( $field ); dd length $field; sub html_parsing { my $raw_html = shift; my $hs = HTML::Strip->new(); my $string = $hs->parse( $raw_html ); ## Problem: here the $string is becoming blank after parsing $hs->eof; return $string; } __END__ 294 267
        Where do you update the database with the new value?