in reply to delimited files

Without knowing more about the structure of your files and/or the range of possible delimiters, this question is impossible to answer in any meaningful way.

Replies are listed 'Best First'.
Re^2: delimited files
by saldoman (Initiate) on May 18, 2005 at 13:13 UTC
    The files are flat text files with anywhere from 3 to 50 or fields. The range of delimiters that I have seen thus far include - _ * & ^ % $ # @ ! ~ ` < > . : ; € œ þ

      If you can assume the same number of fields in each line, you can you can try counting each possible delimiter for the first 5 lines or so, and seeing which returns a reasonable result.

      This is a little rough, and could use better error checking, but something along these lines.

      Run this in the directory containg the csv files. It assumes file extentions of .csv and saves the "corrected" files as filename.csv.new. Modify to suit.

      Update: edited script slightly to remove useless use of array.

      ############################################### use warnings; use strict; my @delimiters = ('_', '*', '&', '^', '%', '$', '#', '@', '!', '~', '` +', '<', '>', '.', ':', ';', '€', 'œ', 'þ', ','); my @files = glob('*.csv'); # or whatever my %likely; for my $file(@files){ open my $fh, '<', $file or warn "Couldn't open $file. $!"; my %delim_count; for my $count (1..5){ my $line = <$fh>; for (@delimiters){ my $testline = $line; $delim_count{$_}{total} += $testline =~ s/\Q$_\E//g; } } for (@delimiters){ if (defined $delim_count{$_}{total} and ($delim_count{$_}{tota +l}) > 2 and ($delim_count{$_}{total}/5 == int($delim_count{$_}{total} +/5))){ no warnings 'uninitialized'; $likely{$file} = $_ if ($delim_count{$_}{total} > $delim_c +ount{$likely{$file}}{total}); } } print "Most likely delimiter for $file is $likely{$file}\n" } for my $file (keys %likely){ if (defined $likely{$file}){ print "Updating $file....\n"; next if ($likely{$file} eq ','); my ($csv,$output); unless (open $csv, '<', $file){ warn "Couldn't open $file. $!"; next; } unless (open $output, '>', "$file.new"){ warn "Couldn't open $file.new for writing. $!"; next; } while (<$csv>){ s/\Q$likely{$file}\E/,/g; print $output $_; } }else{ print "Ambiguous delimiter for file $file\n"; } }