http://qs1969.pair.com?node_id=150092


in reply to Find illegal ASCII characters

#!/usr/bin/perl undef $/; while (<>) { print "File $ARGV has ", length(), " total length\n"; while (/([^\n\r\x20-\x7f])/g) { print "File $ARGV has character ", ord($1), " at byte ", pos()-1, +"\n"; } print "File $ARGV has ", tr/\n//, " total lines\n"; }

-- Randal L. Schwartz, Perl hacker

Replies are listed 'Best First'.
Re: •Re: Find illegal ASCII characters
by Anonymous Monk on Mar 07, 2002 at 22:40 UTC
     while (/([^\n\r\x20-\x7f])/g) {

    Why not to fix the expresion "once" to make the script faster?

      while (/([^\n\r\x20-\x7f])/go) {

    Saluti.

        I was going to answer something like that, but forgot all about it. In the meantime, I've read (once again) some parts of CGI.pm, and found these interesting regexes:

        $toencode =~ s{&}{&amp;}gso; $toencode =~ s{<}{&lt;}gso; $toencode =~ s{>}{&gt;}gso; $toencode =~ s{"}{&quot;}gso; ... $toencode =~ s{'}{&#39;}gso; $toencode =~ s{\x8b}{&#139;}gso; $toencode =~ s{\x9b}{&#155;}gso; ... $toencode =~ s{\012}{&#10;}gso; $toencode =~ s{\015}{&#13;}gso;

        That's at least 9 uses of /o with no variable interpolation in the pattern.

        But CGI also doesn't use strict, has no spaces after commas, uses C-style for-loops where lists where more efficient and better readable, assumes that the IP address "0" equals "127.0.0.1", has ugly #-laden comments, etc... So is the use of /o really more efficient when no variable interpolation is used, or is this yet another strange thing in CGI.pm. (Excuse me for being so negative about CGI.pm, but having read it, I prefer my own set of functions even more.)

        44696420796F7520732F2F2F65206F
        7220756E7061636B3F202F6D736720
        6D6521203A29202D2D204A75657264