Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Kind Monks, I'm trying to pass a string variable to a subroutine in my library for some gentle reformatting because the nasty characters my customers insist on typing into their descriptions occasionally makes the HTML page break.

For example, I want to exchange a tick mark (') with (& rsquo ;) (minus those parentheses of course and no spaces in the rsquo piece). Below is the main program, and below that is the library containing the subroutine. I've snipped out anything I thought irrelevant.

My print statements show the variable $attr_descr going in, getting some loving massage... but when I view the source on the rendered page, the magical substitutions have disappeared except within the subroutine itself.

I might have gone along with the idea that the "local" would confine my results to the sub itself, but I've used a very similar construct to format some numbers based on customer preferences (see the sub "commify"). That returns the $_ variable nicely but the "cleanup" does not return my variable.

Here is the top part of the View Source results:

attr_descr before Nonwovens testing a tick ' and an at @ and a less than < and a double "<br><br> attr_descr entering cleanup <br><br> attr_descr exiting cleanup Nonwovens testing a tick &rsquo; and an at @ and a less than < and a double "<br><br> attr_descr after Nonwovens testing a tick ' and an at @ and a less than < and a double "
I believe I am having a dumb blonde moment... anyone care to help me see the (most likely obvious) thing that I'm missing? Thanks!
#!/usr/local/bin/perl5_8 # Cash Balance program; processes data from main menu program. use strict; use ncw_com_library; # contains common subs (commify, timeout, etc.) use HTML::Template; use Time::Local; use DBI; use CGI ':standard'; my $CGI = CGI->new; # Clear buffers and set up web page (required) $|=1; # [some attributes code snipped out that I believe is irrelevant] my ($attr_descr, $attr_exists, $sth_attr); $dbh=DBI->connect("dbi:Oracle:".$databs,$userid,$passwd) || die "conn attr_sql"; $sth_attr = $dbh->prepare($attr_sel) || die "prep attr_sql"; $sth_attr ->execute || die "exec attr_sql"; while ( $attr_exists = $sth_attr->fetch ) { $project = $attr_exists->[0]; $attr_descr = $attr_exists->[1]; print "attr_descr before $attr_descr<br>"; # Put in HTML-friendly characters for any odd characters &ncw_com_library::cleanup($attr_descr); print "attr_descr after $attr_descr<br>"; # [some more fetched fields snipped out] } # [snipped out other financial data loops for HTML:TEMPLATE; # no issues there] #################### Begin Section ############################## # Pass parameters from @loop arrays to template; print report # [snipped out irrelevant params below] $template->param( passdata => \@loop_data, attr_descr => $attr_descr, project => $project ); print $template->output(); #++++++++++++++++++++ End Section ++++++++++++++++++++++++++++++ ****************************************************************** # This is my library ****************************************************************** package ncw_com_library; # Contains various common subroutines used by the WRS Reports 1; use strict; use DBI; use Exporter (); our @ISA = 'Exporter'; our @EXPORT; use vars @EXPORT=qw/ $asofdt $auth_fail $message $projects $proj_descr $rpt_dates $rpt_id $rpt_unavail $status /; # [snipped out unrelated subroutines] sub cleanup # Replaces various characters with HTML-friendly characters { print "attr_descr entering cleanup $_<br>"; local $_ = shift @_; 1 while $_ =~ s/^(.*)(')(.*)/$1&rsquo;$3/gm; print "attr_descr exiting cleanup $_<br>"; return $_; } # This sub works with the financial data so I used it as the basis for # the cleanup sub above. sub commify # Formats numbers to two decimal places, put in commas, make negs red) { local $_ = sprintf "%.2f", shift @_; 1 while $_ =~ s/^(-?\d+)(\d\d\d)/$1,$2/; $_ =~ s/^(-)(.*)/\($2\)/; { if ( $_ =~ m/^\(.*/ ) { $_ = "style=\"color:#B22222;\">" . $_ } else { $_ = "style=\"color:black;\">" . $_ } } return $_; }

Replies are listed 'Best First'.
Re: Trouble Getting Local Variable $_ to return reformatted string
by jbert (Priest) on Jun 18, 2007 at 13:52 UTC
    The short answer: you're modifying a copy of $_, so the caller doesn't see the modification. You should probably return the value, something like this:
    # Note we assign the return value of the sub to $txt $txt = escape_html_character($txt); sub escape_html_characters { my $string = shift; $string =~ s/...some chars here.../...some escape.../g; # Note we return the modified string return $string; }
    The key points here are that the modified string is returned from the subroutine and you assign the return value of the subroutine back into the $txt variable.

    The long answer:

    • You probably want "use warnings" turned on.
    • You don't really want to be using 'local' and 'local $_' in modern code. So-called 'lexical' variables declared with 'my' are much safer and less likely to cause problems like this.
    • Escaping characters like this is likely to be a solved problem, done by people who are likely to have exhaustively gone through all needed characters, so I'd first look for a module to do this on CPAN. In fact, within the CGI module you're already using, there is the 'escapeHTML' function which looks like what you need.
    Have fun.
Re: Trouble Getting Local Variable $_ to return reformatted string
by grep (Monsignor) on Jun 18, 2007 at 13:52 UTC
    I would avoid a 'roll-your-own' solution and use HTML::Entities.
    Quicker, easier, and more complete.

    grep
    1)Gain XP 2)??? 3)Profit

Re: Trouble Getting Local Variable $_ to return reformatted string
by shmem (Chancellor) on Jun 18, 2007 at 14:19 UTC
    Here's the gotcha:
    # Put in HTML-friendly characters for any odd characters &ncw_com_library::cleanup($attr_descr);

    You expect your cleanup() subroutine to do inplace edits of its argument, instead it takes a copy and returns it modified. Your $attr_descr doesn't get altered.

    Don't use the & prefix for sub calls, unless you know what it does and you need that because you do know ;-)

    # Put in HTML-friendly characters for any odd characters $attr_descr = ncw_com_library::cleanup($attr_descr);

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      Thanks to jbert, grep and shmem for the excellent pointers and explanations.

      I had looked at the various ways to call the sub (do I put in & or leave it out), but didn't think in this case it mattered.

      The simplest solution for me (based on the existing code and needing to add cleanup code elsewhere) was to let the CGI and HTML:TEMPLATE modules handle it.

      Yep, shoulda thunk more 'bout not re-creating the wheel. :-)

      THANKS!
        I had looked at the various ways to call the sub (do I put in & or leave it out), but didn't think in this case it mattered.
        It didn't matter. I'm not sure why that was pointed out to you. The basic thing is that, to modify a passed parameter, you need to do it via @_, like
        # surround all uppercase characters in the passed parameter with *'s # and all lowercase characters with _'s # (modifies parameter in-place) sub transmogrify { my $string = $_[0]; $string =~ s/([A-Z])/*$1*/g; $string =~ s/([a-z])/_$1_/g; $_[0] = $string; return; } $text = "I met this guy, and he looked like he might have been a hat c +heck clerk at an ice rink, which in fact, he turned out to be. Let X + = X."; transmogrify($text); print $text;
        Or you need to not try to modify the parameter, but instead return it, and have the caller call the sub appropriately:
        # surround all uppercase characters in the passed parameter with *'s # and all lowercase characters with _'s sub transmogrify { my $string = $_[0]; $string =~ s/([A-Z])/*$1*/g; $string =~ s/([a-z])/_$1_/g; return $string; } $text = "I met this guy, and he looked like he might have been a hat c +heck clerk at an ice rink, which in fact, he turned out to be. Let X + = X."; $text = transmogrify($text); print $text;
Re: Trouble Getting Local Variable $_ to return reformatted string
by ww (Archbishop) on Jun 18, 2007 at 15:33 UTC

    minor nits, OT of your now-solved chief question:
    First..

    ...because the nasty characters my customers insist on typing into their descriptions occasionally makes the HTML page break.
    For example, I want to exchange a tick mark (') with (& rsquo ;) ...

    You're on the money in saying "customers" sometimes insert "nasty characters." They certainly can and do and sometimes that will bork stuff... but I don't know any way a simple tick could, in your word "break" the HTML page; an ASCII "'" does NOT (in my experience) cause any issue with a browser's rendering. Perhaps this is merely incautious phrasing, but if not, I'd be interested in an example to clarify my understanding.

    and, second... When posting two distinct bits of code, it would be well to enclose them in individual sets of <c>...</c> tags,

    • for clarity ...and because,
    • that way, the Monastery provides two separate "download" links.

    Welcome to the Monastery! Perhaps the promptitude of the excellent replies above will prompt you to get a login.

      They certainly can and do and sometimes that will bork stuff... but I don't know any way a simple tick could, in your word "break" the HTML page; an ASCII "'" does NOT (in my experience) cause any issue with a browser's rendering. Perhaps this is merely incautious phrasing, but if not, I'd be interested in an example to clarify my understanding.

      Attribute values inside an HTML element, when you're using the same type of quotes around the element in question:

      <img alt='Bob's Birthday Party' src='birthday.jpg' height='480' width='640'>

      You might have the same issue with double quotes, if you're double quoting the values of the attributes.

        You're right, of course and thank you. The apostrophe (as a straight tick) inside your single quoted alt would definitely be a problem.

        However, my "incautious phrasing" perhaps led you to this.

        I see no indication that OP was allowing customers insert anything inside .html elements, tags or values. (And -- aside -- he's fetching the data from a database, which might suggest the cleanup should occur before the customers are allowed to insert it there)

        So, rephrasing:

        ...a simple tick (in the kind of data context presented by the OP) could, in....

        And, again, thank you. That kind of good-catch may save much grief for some future reader presented with my ill-considered words
            ...and I surely hope this is more precise

Re: Trouble Getting Local Variable $_ to return reformatted string
by andreas1234567 (Vicar) on Jun 18, 2007 at 18:03 UTC
    You may also want to look into how to restrict user input using Embperl::Form::Validate or equivalent. Remember there are suprisingly many good tools out there.
    [ [ -key => 'lang', -name => 'Language' required => 1, length_max => 5, ], [ -key => 'from', -type => 'EMail', emptyok => 1, ], -key => ['foo', 'bar'] required => 1, ]
    --
    print map{chr}unpack(q{A3}x24,q{074117115116032097110111116104101114032080101114108032104097099107101114})