Making a few assumptions about the nature of your "delimited" files:
It should be possible to analyse the file to determine the delimiter. Something like this may work (untested):
#! perl -slw use strict; my %charsByLine; my %freqCharsPerLine; my %chars; while( <> ) { chomp; for( split '', $_ ) { $chars{ $_ } = 1; push @{ $charsByLine{ $_ } }, $.; $freqCharsPerLine{ $. }{ $_ }++ ; } } my $last = $.; ## Eliminate chars that do not appear in every line @{ $charsByLine{ $_ } } != $last and delete $chars{ $_ } for keys %cha +rs; ## Eliminate chars where they appear a different number of time per li +ne for my $char ( keys %chars ) { my $previousCount = $freqCharsPerLine{ 1 }{ $char }; for my $line ( 2 .. $last ) { if( $freqCharsPerLine{ $line }{ $char } != $previousCount ) { delete $chars{ $char }; last; } } } if( keys %chars == 1 ) { print "The delimiter for this file is: ", keys %chars; } elsif( keys %chars ) { print "Candidate delimiters for this file are: ", keys %chars; } else { print "Unable to determine a likely candidate for this file!"; }
Of course, if the files have header lines, or contain quoted items that can contain the delimiter char, then the above assumptions would need to be modified to account for that. But if there is any consistancy in the format of the files, it should be possible to derive a heuristic that would detect the right character in most cases, and flag any anomolies for manual inspection/determination.
In reply to Re: delimited files
by BrowserUk
in thread delimited files
by saldoman
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |