Once you figure out exactly what these "bad" characters are and what they are supposed to mean, replacing them is easy. The first step is to get a proper look at them:
That will tell you (on STDOUT) which non-ASCII characters are in a given set of data, and how many of each. It also lists (on STDERR), the lines containing bad characters, and the hex codes for the bad characters on each line.my %badchars; while (<>) { my @bad = ( /([^[:ascii:]])/g ); # find all bad characters if ( @bad ) { for my $badch ( @bad ) { $badchars{$badch}++; } my $badlist = join ' ', map { sprintf("%02x", ord{$_}) } @bad warn "line contains bad char(s) ($badlist): $_"; } } for my $badch ( sort keys %badchars ) { printf "%6d %02x %s\n", $badchars{$badch}, ord($badch), $badch; }
(If you prefer, you can add some code there to open two distinct output files yourself, write the "warn" messages to one of them and the "printf" output to the other.)
That will get you started, but if your input data is utf8 or some other multi-byte encoding, you need to know what encoding it is, and use perl to interpret it correctly as characters. I won't pursue this further, because you haven't given enough detail yet.
If you have more questions, come back with some specific details (the perl code you used, some actual input data, and the actual output you got).
In reply to Re: Checking for Valid Characters
by graff
in thread Checking for Valid Characters
by hozefa
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |