dirtdog has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I'm having trouble finding a one liner or anything that will identify the line containing this character (0x13). I have grep commands and perl one liners to identify non-ascii characters which I thought this was, but none of them identify it.

CONSENT GRANTED ^S to accept the Plan and grant proxy to Lucid AMERICA FRANCE MEXICO CANADA RUSSIA INDIA

And when I do an od -bc on the file it's 023.

Does anyone know of a one liner to isolate this character from a file?

Your help is greatly appreciated!

Replies are listed 'Best First'.
Re: How to find Unicode: 0x13 in File
by choroba (Cardinal) on Nov 18, 2016 at 15:03 UTC
    You haven't shown the "grep commands and perl one liners", so it's hard to tell what's wrong with them. The following finds the character \x13 in a file in bash:
    grep $'\023' file

    Same in Perl:

    perl -ne 'print if /\023/' file perl -ne 'print if /\x13/' file

    Finding "Unicode" in a file is not possible if you don't know the encoding of the file. In the examples above, it works for UTF-8 (and probably other ones, too).

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      finding "Unicode" in a file is not possible if you don't know the encoding of the file.

      Indeed! just to add something take a look at tchrist about Perl and Unicode: No magic bullet (SO)

      L*

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      I may be wrong here, but if I am, I will learn something new :)

      Could you not read in few MB's of the file (if it is big enough) and then unpack it and then test to see if the character matches 0x13?

      Something like:
      open (my $fh, '<', 'file') or die "$!\n"; binmode($fh); while(read $fh, my $char, 0x01){ $buf = unpack('H*', $char); if ($buf =~ /13/){ print "found 0x13\n" } }
      Contents of 'file': '.Eg5™eEfx`.' #'.' = 0x13;
      Im not up to par on unicode so I could be way off.
        > 0x01

        Why do you specify the length in hex?

        Also note that if you use a length greater than 1 (which you want to speed it up), you can find false positives: read $fh, my $char, 2 reports 0x13 present in the following file:

        a1

        because

        $ perl -wE 'say unpack "H*", "a1"' 6131 ~~

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      I was using the following which did not work :

      perl -ne 'print "$ARGV:$.\n" if /[^[:ascii:]]/;' $filename grep -e "[\x{00FF}-\x{FFFF}]" $filename

      The Command you sent worked perfectly

      Thanks!

        > did not work

        And here's why:

        • [:ascii:] matches character in the range 0-127.
        • 19 doesn't belong between 255 and 65535.

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,