in reply to Re^3: character encoding ambiguities when performing regexps with html entities
in thread character encoding ambiguities when performing regexps with html entities

yeah, this is what I ended up doing. I used the regex previously mentioned to find them, though. There was no way I was going to eyeball ~20,000 lines of code... This is what I ended up using to find them; hopefully someone can look at it and use it later:
#!/usr/bin/env perl # find_extended_chars.txt # this script finds any non-alphanumeric characters, codes, etc in a # given input file and prints the line & line number so that it can be # changed. use strict; use Cwd; foreach my $file (@ARGV) { my $error = 0; my $line = 1; open FILE, $file; while (<FILE>){ if ($_ =~ m/[^[:ascii:]]/ ){ if (!$error){ print $file."\n"; $error = 1; } print $line. "\t".$_; } $line++; } }
  • Comment on Re^4: character encoding ambiguities when performing regexps with html entities
  • Download Code