srdst13 has asked for the wisdom of the Perl Monks concerning the following question:

This is a simple one, I suppose. I have a tab-delimited text file that I want to strip of characters except \w\d _,.-[](). Simply, how can I strip out characters EXCEPT those that I want? Told you it was simple.

Thanks,
Sean

Replies are listed 'Best First'.
Re: Cleaning a text file
by RazorbladeBidet (Friar) on Feb 25, 2005 at 20:31 UTC
    [^] will negate a character class - e.g. [^a] is everything except "a"

    see perlop
    --------------
    It's sad that a family can be torn apart by such a such a simple thing as a pack of wild dogs
Re: Cleaning a text file
by brian_d_foy (Abbot) on Feb 25, 2005 at 20:53 UTC

    Just loop through each line and substitute everything that not one of those characters (or a tab!) with nothing. This program takes names of the files from the command line and prints the result to standard output (which you can then redirect). Don't use this on the original data until you're sure it does what you want. In the regular expression, the \d and _ that you want are part of \w.

    #!/usr/bin/perl while( <> ) { s/[^\w ,.()\t-]+//g; print; }

    You can also do this from the command line with perl's in-place editing feature. Again, make sure you have your original data in a safe place even though this creates a backup ffile for you.

    perl -pi.old -e 's/[^\w ,.()\t-]+//g' input_file
    --
    brian d foy <bdfoy@cpan.org>
Re: Cleaning a text file
by THRAK (Monk) on Feb 25, 2005 at 20:54 UTC
    You might also find the negated classes \W and \D useful for your needs. \W excluded a-z, A-Z, 0-9 and _ while \D excludes just 0-9.
Re: Cleaning a text file
by BUU (Prior) on Feb 26, 2005 at 06:19 UTC
    If you just want to delete single characters, you can use the tr operator combined with the c and d options.