multibyte match works with tr but not in perl??

the_sheriff has asked for the wisdom of the Perl Monks concerning the following question:

I have a bunch of html files with a specific non-printing character I want to get rid of. For example, in the string "Children's" ... the apostrophe appears to be a multibyte character as shown below:

% grep "Children" index.html | od -c 
0003060  C   h   i   l   d   r   e   n 342 200 231   s
[download]

For that file I can run the following command to fix it:

% tr -s '\342\200\231' \' < file.html
[download]

but since I have a ton of files scattered everywhere, I'd like to be able to do something like the following:

% perl -i -pe "s/\342\200\231/'/g" `find /home -name "*.html"`
[download]

...but no luck. I've read the part of perlfaq6 addressing this but it seems to say you can do this if you just search for the byte as if it was separate bytes...like what I have above...but there's a good chance I'm misunderstanding. Has anybody done something similar or have any suggestions?

Comment on multibyte match works with tr but not in perl?? Select or Download Code

Replies are listed 'Best First'.
Re: multibyte match works with tr but not in perl?? by belg4mit (Prior) on Jan 24, 2003 at 07:00 UTC
That kinda looks like a Unicode character. If you're using perl 5.6 or greater you must get away from thinking of files as a sequence of bytes and instead as a sequence of characters. See perlunicode for more. `-- I'm not belgian but I play one on TV.`	[reply]
Re: multibyte match works with tr but not in perl?? by Enlil (Parson) on Jan 24, 2003 at 05:36 UTC
the main problem I think is that `tr///` and `s///` do different things. two seperate operators within perl update:You can find more info here: perlop -enlil	[reply] [d/l] [select]
Re: multibyte match works with tr but not in perl?? by skx (Parson) on Jan 24, 2003 at 14:40 UTC
Could you not use the posix printable class? % perl -pi.bak -e "s/[[:^print:]]//g" `find /home -name "*.html"` [download] Steve --- steve.org.uk	[reply] [d/l]