Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

dear monks, pls suggest me some regular expression to match the non ascii characters in text file

Replies are listed 'Best First'.
Re: regular Expression
by Transient (Hermit) on Jun 28, 2005 at 14:16 UTC
    /[^\x00-\x7F]/

      Also known as /[\x80-\xFF]/ ?

      Update: It's a question ... really.

        Sure, unless you're using Unicode.
Re: regular Expression
by Ido (Hermit) on Jun 28, 2005 at 14:20 UTC
    You could use the POSIX character class:
    /[:^ascii:]/. Check perlre.
Re: regular Expression
by ghenry (Vicar) on Jun 28, 2005 at 14:17 UTC

    Have a read through the regular expression Tutorials

    Walking the road to enlightenment... I found a penguin and a camel on the way.....
    Fancy a yourname@perl.me.uk? Just ask!!!
Re: regular Expression
by fmerges (Chaplain) on Jun 28, 2005 at 14:28 UTC

    Hi,

    Once you get familiar with regular expressions, take a look at Regexp::Common

    Regards,

    :-)
Re: regular Expression
by ikegami (Patriarch) on Jun 28, 2005 at 15:15 UTC

    If you meant non-ASCII characters and non-displayable ASCII characters, then use /[^\x20-\x7E]/.

      This looks like /[[:print:]]/ to me.

Re: regular Expression
by l.frankline (Hermit) on Jun 29, 2005 at 15:09 UTC
    try

    $_ =~ /[^\w\d\s]+/;

    * Frank *
Re: Regular expression
by emav (Pilgrim) on Jun 29, 2005 at 10:37 UTC

      That doesn't really do what the original poster asked for tho' does it? Ascii characters are those with a character code between 0 and 128. Your code only checks for character codes between 33 and 126 (plus a few whitespace characters).

      To match _all_ non-ascii characters you really need something like:

      /[^\x00-\x7f]/
      --
      <http://www.dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg

      Marvellous!It works Fine but pls explain me emav
        ^ = exclude
        !-~ = all characters between ! and ~
        \s = or spaces
        g = search globally

        As davorg has pointed out, mine is a rough regex but I use it often to find non-English characters in my xml files, as it not only works embedded in a perl script but also with editors that use perl-based regex's such as my favourite Unired.