Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am able to use either of the lines here to clean a single file of non ascii characters. I also have a program that will loop through a whole corpus of files, but when I use one of the lines below to clean them, the new files created are missing most of the text if not all. I can see that the program is working. Why will these lines work with a single file but not with a whole folder of files?

$line=~ s/[^!-~\s]//g;

internal brackets do not seem to show up when I post

$line =~s/[^[:ascii:]]//g;

internal brackets do not seem to show up when I post

Also Perl seems to dislike the use of !. It says void context: $line !~s/[^[:ascii:]]//g;

Code tags added by GrandFather

Replies are listed 'Best First'.
Re: clean corpus of ascii not a file
by GrandFather (Saint) on Oct 29, 2014 at 22:17 UTC

    How about you show us the code that doesn't work instead of the line of code that does work?

    Also Perl seems to dislike the use of !. It says void context: $line !~s/[^[:ascii:]]//g;

    The operator !~ only makes sense if you are using the result. It negates the result of the more usual =~ operator. Where as it often makes sense to ignore the result of =~, it never makes sense to ignore != because if you didn't want the result you'd just use =~.

    Perl is the programming world's equivalent of English
Re: clean corpus of ascii not a file
by AnomalousMonk (Archbishop) on Oct 30, 2014 at 00:23 UTC
    ... Perl seems to dislike the use of !.

    Not really.

    It says void context ...

    Not really. Sometimes it's useful to see what Perl thinks of your code. (See O and B::Deparse.)

    c:\@Work\Perl>perl -wMstrict -MO=Deparse,-p -le "my $line = qq{xyz\x80\x90\xa0abc}; print qq{'$line'}; ;; $line !~ s/[^[:ascii:]]//g; print qq{'$line'}; " Useless use of negative pattern binding (!~) in void context at -e lin +e 1. BEGIN { $^W = 1; } BEGIN { $/ = "\n"; $\ = "\n"; } use strict 'refs'; (my $line = "xyz\200\220\240abc"); print("'${line}'"); (not ($line =~ s/[^[:ascii:]]//g)); print("'${line}'"); -e syntax OK

    In this case (as pointed out by GrandFather above), the rather unusual (but syntactically correct) statement
        $line !~ s/[^[:ascii:]]//g;
    is logically inverting the result produced by the  s/// built-in after it finishes operating on the string to which it is bound ($line in this case). This result is the number of substitutions performed, 3 in the case of the code example. The truth of 3 (i.e., true) is then inverted to  '' (the empty string), the canonical false value. But this value is then thrown away! Because you asked it to by enabling warnings, Perl is warning you about a "Useless use of ..." some operation, logical inversion in this case. (Granted, the full warning here is a bit puzzling, but it's a very unusual statement. Maybe use-ing diagnostics would be more informative. Try it.)

    c:\@Work\Perl>perl -wMstrict -le "my $line = qq{xyz\x80\x90\xa0abc}; print qq{'$line'}; ;; $line !~ s/[^[:ascii:]]//g; print qq{'$line'}; " Useless use of negative pattern binding (!~) in void context at -e lin +e 1. 'xyzÇÉáabc' 'xyzabc'

    Try assigning the result of the  s/// to a variable with, e.g.,
        my $result = $line !~ s/[^[:ascii:]]//g;
    or
        my $result = $line =~ s/[^[:ascii:]]//g;
    (no logical inversion) and see what you get. Happy experimenting!