Re: Re: Substituting Newline Characters

Compare that to the code from CGI.pm v3.01:

sub escapeHTML {
         # hack to work around  earlier hacks
         push @_,$_[0] if @_==1 && $_[0] eq 'CGI';
         my ($self,$toencode,$newlinestoo) = CGI::self_or_default(@_);
         return undef unless defined($toencode);
         return $toencode if ref($self) && !$self->{'escape'};
         $toencode =~ s{&}{&amp;}gso;
         $toencode =~ s{<}{&lt;}gso;
         $toencode =~ s{>}{&gt;}gso;
         $toencode =~ s{"}{&quot;}gso;
         my $latin = uc $self->{'.charset'} eq 'ISO-8859-1' ||
                     uc $self->{'.charset'} eq 'WINDOWS-1252';
         if ($latin) {  # bug in some browsers
                $toencode =~ s{'}{&#39;}gso;
                $toencode =~ s{\x8b}{&#8249;}gso;
                $toencode =~ s{\x9b}{&#8250;}gso;
                if (defined $newlinestoo && $newlinestoo) {
                     $toencode =~ s{\012}{&#10;}gso;
                     $toencode =~ s{\015}{&#13;}gso;
                }
         }
         return $toencode;
}
[download]

Cheers Sören

Comment on Re: Re: Substituting Newline Characters Download Code

Replies are listed 'Best First'.
Re: Re: Re: Substituting Newline Characters by tachyon (Chancellor) on Mar 16, 2004 at 00:03 UTC
As it happens i am remarkably familiar with the guts of CGI.pm. I do hope you are not proposing using 5000 lines of CGI.pm for this task? If you are I take it you are aware that outside of a CGI context that it will default to a charset of ISO-8859 aka Latin. You could also note that the /s and /o modifiers on the REs are pointless in context, it does not correctly escape whitespace, and does not deal with `\n -> <br>` which was at the heart of the original thread..... And your point was? cheers tachyon	[reply] [d/l]
4Re: Substituting Newline Characters by jeffa (Bishop) on Mar 16, 2004 at 02:35 UTC
I too am confused what the point might be, especially since your function does more than the CGI.pm ... but do correct me if i am wrong here -- i was under the assumption that memory and speed were so cheap these days that using a mere 5000 lines is not really that bad after all. Back when i was a Comp Sci undergrad, a peer who majored in Industrial Engineering explained to me that there was no need to worry about optimization since hardware was making leaps and bounds. I, of course, scoffed at that, and i still believe that someone had damn better well keep the optimization torch burning because hardware does have a limit ... but the truth is that we are only talking about a few extra seconds at best by using CGI.pm. I just ran Devel::Profile on two scripts, one that imported CGI.pm's escapeHTML and one that used yours. Here are the results: Milleage will vary, but unless i am missing something, that's a whoping extra .05 of a second. But at this point, i would use your subroutine because ... well, there it is now isn't. What was that saying ... yes, don't go looking a gift horse in the mouth. That or don't complain when a saint posts code that works, is fast, and is free. ;) jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l] [select]
Re: 4Re: Substituting Newline Characters by tachyon (Chancellor) on Mar 16, 2004 at 03:09 UTC
We do stuff dealing with billions of records so those little bits add up. Anyway it benches 4 times faster as you might expect. It is a tribute to p5p that it is only 4 times slower. I have actually defactored some code recently for example. We have a merge and an unmerge function, very similar code so I added a flag and a couple of if clasues so the unmerge was just another call to merge with the flag set. You know the usual stuff. But those two extra ifs every loop added 30% to the runtime - for both functions. So I had less code, although it was more complex but killed the runtime. In my version of the real world my fixed costs are servers and bandwidth. The more efficient I can make my code in terms of memory use and throughput the more clients we can shoehorn onto a single server which directly hits the bottom line. Compact functions are also easier to unit test which helps stability. As always YMMV. Whatever works for you is all you need. cheers tachyon	[reply]
Re^6: Substituting Newline Characters by Aristotle (Chancellor) on Mar 16, 2004 at 14:20 UTC
Re: Re^6: Substituting Newline Characters by tachyon (Chancellor) on Mar 16, 2004 at 15:46 UTC
Some notes below your chosen depth have not been shown here