EvanCarroll has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks teach me something.. I'm having this issue a lot lately. I either scrape a website, or a pull in data from a csv, and I get stupid garbled non-sense like this:

SERIAL# �-----------------. WD2PD644366123

I know what to do insomuch as this involves using `od` but when it comes to ridding the characters in perl, I always get lost, and up figuring it out with an almost brute force approach.

So `od -c` will output

0104200 D 5 3 0 0 6 \n S T O C K N +O 0104220 375 - - - - - - - - . . . . . . +.

so the character I want remove has a od code of 0375 -- obviously that isn't right..

`od -xc` will return this

2dfd 375

not sure what this means

`hexdump -c` returns

0008880 D 5 0008890 375 - -

so 375 is obviously significant in `od -c` (oct) or `hexdump -c` (hex) ... Could someone please explain or point me to where I can find out how to use these utilities effectively, and how I strip them from text, I thought a simple tr/\{something here}//d would work...

Problem found

See Re^2: hexdump/od/perl question which is on this thread. Note to readers you don't always address octals with a leading 0



Evan Carroll
www.EvanCarroll.com

Replies are listed 'Best First'.
Re: hexdump/od/perl question
by oxone (Friar) on Aug 10, 2007 at 18:05 UTC
    Further to your discussion with ikegami, maybe you want to substitute \375 rather than \0375. I think an octal char reference of this type has to be 2- or 3-digit.

    Does that do the trick?

      You're right
      >perl -e"print ord qq{\375} 253 >perl -e"print ord qq{\0375} 31 >perl -e"print ord qq{\037} 31
        You're awesome, thanks a ton! $line =~ tr/\375//d; works fine.
        So you only prefix octals with a 0, if they are 2 digits. Grr.. perl not doing what I want again!
        Could you please show now how it would be done if I wanted to address this in hex notation from the start, ie the tool to use, and how to target it within perl, just so I don't have to revisit this topic later. Thanks again.


        Evan Carroll
        www.EvanCarroll.com
Re: hexdump/od/perl question
by ikegami (Patriarch) on Aug 10, 2007 at 17:22 UTC
    tr/\0375//d;
    or
    s/\0375//g;

    Update: As shown by oxone, \0375 is wrong. It should be \375. While I tested the above, my test was incomplete/flawed.

      This is what I had originally thought too, but
      my %db; foreach my $file ( @ARGV ) { print "Opening $file" if DEBUG; open ( my $fh, '<', $file ) or die "can not open $file" ; while ( my $line = <$fh> ) { chomp $line; $line =~ tr/\0375//d; next unless $line =~ /(.*?)[.-]*\t(.*)/; my ( $k, $v ) = ( $1, $2 ); $db{$k} = undef; } } print for keys %db;
      still has the funky characters on output.


      Evan Carroll
      www.EvanCarroll.com

        I tested my code, so something doesn't jive. Try adding the following to your code:

        use Data::Dumper qw( Dumper ); local $Data::Dumper::Useqq = 1; print(Dumper($line));

        Put the print before the tr///d.