Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm having problems stripping the ’ charcter from a filename in linux, however i can succesfully read strip this character in windows. When I use linux and paste a ’ into my code it comes out as a full stop. Also it doesn't seem to have an ascii equivalent, so how do I strip is out? I've already tries s/'//; but thats what i need is s/’// for linux. any ideas guys?

Replies are listed 'Best First'.
Re: filenames with ’
by zengargoyle (Deacon) on Feb 16, 2003 at 13:35 UTC

    is that a ` (`) or a ' (') ...

    try something like this..

    $test = q("`'); # double-quote left-single-quote plain-single-quote # i'm not even sure what left/right-double-quote or # right-single-quote would be on my keyboard =( $inhex = unpack 'H*', $test; @hexcodes = $inhex =~ m/(..)/g; print "@hexcodes", $/; # prints 22 60 27 # now i can s// with the hex representation of the character. $test =~ s/\x60//g; # remove any ` characters (hex 60) print $test, $/; # prints "'
Re: filenames with ’
by fruiture (Curate) on Feb 16, 2003 at 13:49 UTC

    Yes, there are three of "these" characters. A single quote '; a backtick `, which is also a quite common and defined in ISO-8859-1 (and others). The third is included within windows charsets, bit i can't type it here, because it does not exist within ISO-8859 character sets. This should clear that mess.

    #try: print join ' | ', map ord, split //, $the_string # or print unpack 'H*' , $the_string # to find out what this character looks like
    --
    http://fruiture.de

      Actually, true ISO-8859-1 defines no 8-bit quotes. Only the variants of it (e.g., MS and Mac character sets) do that. The ISO-8859-1 standard leaves the first 32 8-bit characters undefined.

      There's a better illustration here showing the characters MS added to the standard. MS changing a standard ... imagine that!

Re: filenames with ’
by steves (Curate) on Feb 16, 2003 at 13:43 UTC

    It may not be the "regular" ASCII quote that's in the file name, but an 8-bit value that is a quote in the character set on your computer. Quotes are one of those things you find different versions of in the 8-bit range depending on your character set. For example, look at the MS Windows character set here (partway down the page). MS has taken standard ISO-8859-1 (also known as ISO-Latin1) and added their own values in what's normally the control character range (the first 32 characters in the 8-bit range). Hex 91 and 92 are quotes -- "different" quotes than the regular old ASCII quotes at hex 27 and hex 60.

    If this is the case, you need to find out what value is in the name and use that as your match, e.g.:

    $file_name = s/\x92//g;
Re: filenames with ’
by Cody Pendant (Prior) on Feb 16, 2003 at 21:03 UTC
    I just tried to do this using ord().

    I found the ascii code using

    $char = '‘'; # a mac smart-quote character print ord($char);
    and got 213.

    But then I tried to replace it using

    $string =~ s/\213//;
    and nothing happened. I guess I'm missing something obvious?
    --
    “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
    M-J D

      print ord($char); shows you the decimal value of the character.

      $string =~ s/\213//; attempts to replace the character with octal value 0213, which is decimal 139.

      Try instead with $string =~ s/\325//; and you have a better chance it'll do what you want.

      You don't say why you are doing this, but if the purpose is to clean up a filename to contain only valid characters (however you define "valid" locally), it would be better to invert the sense to remove everything but the valid characters. Then you might end up with something like:

      # allow only letters, digits, underscore and dot (my $cleanfile = $file) =~ s/[^\w.]//g;
      Hugo