Hi,

I'm running the following script against some CSV files prior to doing anything else. It basically checks for non-ascii, and either makes the specified conversion, or shows the value that it didn't have a spec for.

However, others will need to make use of the script, which means that adding new regexes to it would be inconvenient. We use SVN, but that's still a kludge.

Can anyone see a good way to load the RegExes from an eternal file?

Also, and unrelated, do I need to use binary? It seems to be advised, but I thought that was just line-endings, and it means that I have to do my own translation, which seems a bit redundant. And can I check? Do I need "use utf8" as I'm just searching for octal sequences? (iirc yes).

use strict; use utf8; use File::Basename; my @files = glob($ARGV[0]); my $outdir = $ARGV[1]; my $debug = $ARGV[2]; die "No output directory given\n" unless -d $outdir; $outdir =~ s/\\/\//g; # backslash to forward $outdir =~ s/([^\/])$/$1\//; # add final slash if missing foreach my $file(@files){ my $outfile = $outdir . '/' . basename($file); open(CSV, '<', $file)||die "Cannot open $file for read:$!\n"; binmode CSV; open(OUT, '>', $outfile)||die "Cannot open $outfile for write:$!\n"; while (my $line = <CSV>){ $line =~ s/\x0D\x0A/\n/g; # binary, so we're still stuck with \r\n + dos endings possibly - why are we using binary? if($line =~ /[^[:ascii:]]/){ print "Before: $line\n" if $debug; # translations from octal sequence to ascii char $line =~ s/\302\267/./g; # odd utf 'floating' point to a +scii . $line =~ s/\342\200\230/'/g; # left single curly quote to as +cii ' $line =~ s/\342\200\231/'/g; # right single curly quote to a +scii ' $line =~ s/\342\200\223/-/g; # em-dash to ascii - $line =~ s/\303\257/i/g; # double-dot i to ascii i $line =~ s/\302\243/GBP/g; # pound sign to GBP $line =~ s/\342\200\246/.../g; # elipsis to ascii ... $line =~ s/\302\256/(a)/g; # @ to (a) $line =~ s/\303\250/e/g; # grave e to e $line =~ s/\303\251/e/g; # acute e to e $line =~ s/\342\211\244/\>\=/g; # utf >= to ascii >= $line =~ s/\342\211\245/\<\=/g; # utf <= to ascii <= $line =~ s/\303\264/o/g; # circumflex o (?!?) to ascii o $line =~ s/\302\240/\s/g; # nbsp to sp $line =~ s/\302\263/\^3/g; # superscript 3 to ^3 $line =~ s/\302\262/\^2/g; # superscript 2 to ^2 $line =~ s/\302\260/ degrees/g; # degrees symbol to word ' degr +ees' $line =~ s/\342\200\235/""/g; # right double curly quote to a +scii " (escaped for csv) $line =~ s/\342\200\234/""/g; # left double curly quote to as +cii " (escaped for csv) $line =~ s/\302\275/1\/2/g; # utf 1/2 to ascii plain 1/2 if($line =~ /[^[:ascii:]]/){ $line =~ s/([^[:ascii:]])/'[' . (ord $1) . '\/' . (sprintf("0x +%X", (ord $1))) . '\/' . (sprintf("%o", (ord $1))) . ']'/ge; print "Unhandled sequence: $line\n"; } print "After: $line\n" if $debug; } print OUT "$line"; } }
map{$a=1-$_/10;map{$d=$a;$e=$b=$_/20-2;map{($d,$e)=(2*$d*$e+$a,$e**2 -$d**2+$b);$c=$d**2+$e**2>4?$d=8:_}1..50;print$c}0..59;print$/}0..20
Tom Melly, pm (at) cursingmaggot (stop) co (stop) uk

In reply to Read RegEx from file by Melly

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.