isync has asked for the wisdom of the Perl Monks concerning the following question:

Running an old perl version with no option to upgrade-
is there a quick regex to simply delete high-bit characters (from the unicode range) to get a quirked string a bit more right...?

Replies are listed 'Best First'.
Re: Regex to delete high-bit characters
by grinder (Bishop) on Oct 05, 2007 at 13:17 UTC

    You don't need a regexp for that:

    $str = join '', map {$_ & 0x7f} split //, $str;

    hmmm. On second thoughts...

    $str =~ s/(.)/$1 & 0x7f/eg;

    • another intruder with the mooring in the heart of the Perl

      Thanks for the solution!
Re: Regex to delete high-bit characters
by Skeeve (Parson) on Oct 05, 2007 at 14:24 UTC

    even easier:

    tr/\x80-\xff/\x00-\x7f/;

    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
Re: Regex to delete high-bit characters
by jwkrahn (Abbot) on Oct 05, 2007 at 14:40 UTC
      That would be a good solution for any reasonably current version of Perl. I'm not exactly sure when the POSIX [:blah:] notations were incorporated into perl's regex engine, but I know some people are running perl environments where it is not available:
      $ perl -v This is perl, version 5.005_03 built for sun4-solaris Copyright 1987-1999, Larry Wall ... $ perl -le '$_="abc"; print "ok" if (/^[[:ascii:]]+$/)' # (no output -- that char.class doesn't work) $ perl -le '$_="abc"; print "ok" if (/^[\x00-\x7f]+$/)' ok

      The OP didn't actually say how far back he needs to go with his perl version(s), but I'm not surprised if 5.005 is part of his condition.

      (Update: forgot to mention: that same sun4-solaris machine also has perl 5.8.0 installed as a non-default version, and the first one-liner works with that as expected:

      $ perl5.8.0 -le '$_="abc"; print "ok" if (/^[[:ascii:]]+$/)' ok