Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

validating file with perl

by mmittiga17 (Scribe)
on Jul 16, 2013 at 20:58 UTC ( #1044659=perlquestion: print w/replies, xml ) Need Help??

mmittiga17 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, Having trouble trying to figure out best way to validate a csv file is good based on two csv fields.

field1 field2 3-2000/7.48 1 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 3 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 3

to be able to say file is good, each unique ID "field1" will need have a field2 at some point that is 1 and 3. this example above 3-2000 has a record of both a 1 and 3 in field2 however 4-0000 does not have a record with 1 in field2 Ideas suggestions? Thanks in advance.

Replies are listed 'Best First'.
Re: validating file with perl
by space_monk (Chaplain) on Jul 17, 2013 at 05:47 UTC

    Sigh, I go away for a few weeks and see lots of solutions which don't use Text::CSV and their friends. Text::CSV can handle space or tab separated data as well as "normal" comma separated data.

    Please do not succumb to the Dark Side, and use the Force of existing CPAN modules properly. :-)

    use Text::CSV; my %data; my $csv = Text::CSV->new ( { binary => 1 } ) # should set binary attr +ibute. or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!"; while ( my $row = $csv->getline( $fh ) ) { my ($key, $value) = @$row; $data{$key} =0 if (!exists $data{$key}); $data{$key} |=1 if ($value == 1); $data{$key} |=2 if ($value == 3); } $csv->eof or $csv->error_diag(); close $fh; foreach my $key (%data) { print "$key invalid\n" if ($data{$key} != 3); }
    If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)
Re: validating file with perl
by farang (Chaplain) on Jul 16, 2013 at 22:15 UTC

    I'm sure there are more elegant and probably better solutions, but this is one way to do it.

    #!/usr/bin/env perl use strict; use warnings; my %fields; my $header = <DATA>; while (<DATA>) { chomp; next if /\A\s*\Z/; my ($f1, $f2) = split; push @{ $fields{$f1} }, $f2; } for my $k (keys %fields) { my ($one, $three) = (0, 0); # if either of these remain zero, ID i +s invalid for my $values ( @{ $fields{$k} } ) { ++$one if ( $values == 1 ); ++$three if ( $values == 3 ); } print "invalid ID: $k\n" unless $one * $three; } __DATA__ field1 field2 3-2000/7.48 1 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 3 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 3

Re: validating file with perl
by mtmcc (Hermit) on Jul 16, 2013 at 22:23 UTC
    Maybe this would work:

    #!/usr/bin/perl use strict; use warnings; my $fileName = $ARGV[0]; open (FILE, "<", $fileName); my $header = <FILE>; my @countArray = (); my @line; my @allTags; my %results; my $uniqueTag; my $one = "NO"; my $three = "NO"; while (<FILE>) { @line = split(" ", $_); $uniqueTag = $line[0] if @line > 1; if (@line == 2) { $three = "YES" if $line[1] == 3; $one = "YES" if $line[1] == 1; } if ((@line <2) || (eof)) { if (($three eq "YES") && ($one eq "YES")) { $results{$uniqueTag} = "GOOD"; } else { $results{$uniqueTag} = "BAD"; } push (@allTags, "$uniqueTag"); $one = "NO"; $three = "NO"; } } for (@allTags) { print STDERR "$_\t $results{$_}\n"; }

    -Michael
Re: validating file with perl
by Anonymous Monk on Jul 16, 2013 at 22:25 UTC

    Something like this:

    use strict; use warnings; my %data; <DATA>; while(<DATA>){ chomp; next if/^\s*$/; my ($key, $value) = split; push @{$data{$key}},$value; } my $counter; for(keys %data){ for(@{$data{$_}}){ ++$counter if $_ == 1 or $_ == 3; } print $_,' a good file',$/ if $counter == 2; } __DATA__ field1 field2 3-2000/7.48 1 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 3 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 3

      What if there were two 1's and no 3's or vice versa. Then $counter == 2 and the file wouldn't be good.
Re: validating file with perl
by kcott (Archbishop) on Jul 17, 2013 at 10:59 UTC

    G'day mmittiga17,

    Here's my take on a solution:

    #!/usr/bin/env perl -l use strict; use warnings; my %good; while (<DATA>) { my ($id, $flag) = split; next unless $id; ++$good{$id}[$flag]; } for (keys %good) { print if $good{$_}[1] && $good{$_}[3]; } __DATA__ 3-2000/7.48 1 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 3 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 3

    Sample run:

    $ pm_id_flag_validation.pl 3-2000/7.48

    -- Ken

Re: validating file with perl
by si_lence (Deacon) on Jul 17, 2013 at 12:26 UTC

    Just one more solution using the input record separator special variable ,$/. It does make a few assumption about the structure of the data though.

    use strict; use warnings; my $header = <DATA>; $/ = "\n\n"; while (<DATA>) { if (/ 1\n.* 3\n/s) { my $id = substr($_,0,11); print "good for $id\n"; } } __DATA__ field1 field2 3-2000/7.48 1 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 2 3-2000/7.48 3 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 2 4-0000/8.40 3

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1044659]
Approved by NetWallah
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2023-01-29 18:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?