advice for reading data from a file

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: advice for reading data from a file by Limbic~Region (Chancellor) on Jan 18, 2004 at 18:46 UTC
Anonymous Monk, Your approach seems fine if you only need to read the second line of each file, but you don't say if you will ever need to process the entire file. I am suggesting using Text::CSV_XS in that case, which also has the added bonus of properly handling imbedded delimiters if you run into that problem. #!/usr/bin/perl -w use strict; use Text::CSV_XS; my @files = qw(foo bar blah asdf); for my $file ( @files ) { if ( File_Type( $file ) ) { print "Do something with $file\n"; } } sub File_Type { my $file = shift; open (INPUT , '<' , $file) or die "Unable to open $file for readin +g : $!"; my $csv = Text::CSV_XS->new( {'sep_char' => ';'} ); while ( <INPUT> ) { next if $. != 2; chomp; if ( $csv->parse($_) ) { my @field = $csv->fields; die 'Incorrect number of fields' if @field != 4; return $field[2] =~ /^\d+$/ ? 1 : 0; } else { print "Unable to parse: ", $csv->error_input, "\n"; return 0; } } } [download] I left most of your code intact as you probably have it that way for a reason. Cheers - L~R	[reply] [d/l]
Re: Re: advice for reading data from a file by Anonymous Monk on Jan 18, 2004 at 19:05 UTC
thanks for your answer limbic-region, but here i just need to process the 2 nd line, so i didn't want to fire-up Text::CSV_XS just for that :)	[reply]
Re: advice for reading data from a file by Aragorn (Curate) on Jan 18, 2004 at 18:16 UTC
Seems perfectly reasonable to me. If this routine works for the files you have to process, it is correct. Maybe a `warn` instead of the `die`s in the routine can be used to tag the "corrupt" files so that the program doesn't bail out if only 1 or 2 files of the hundreds are faulty. But this may not be appropriate for your purpose. Arjen	[reply] [d/l] [select]
Re: Re: advice for reading data from a file by JamesNC (Chaplain) on Jan 18, 2004 at 18:48 UTC
I agree with aragorn with getting rid of the die. I would not even bother with the warn unless I wanted to watch it. I think that it would better to send your errors to a log file with the file_name or any other stats so you can continue to process the correct files that would include logging files we can't open perhaps. Also, you are performing a regex on the return value which may be undef. I would think you should do the regex before you return the field if (log that too) in case that doesn't meet your criteria so you can be sure you have a valid return.	[reply]
Re: Re: Re: advice for reading data from a file by Anonymous Monk on Jan 18, 2004 at 19:26 UTC
Also, you are performing a regex on the return value which may be undef.I would think you should do the regex before you return the field if (log that too) excuse me but i don't understand, here i'm not returning the value of the field but the returning value of the regexp which i think can be only 0 or 1 but i maybe wrong. do you mean that my sub can return 'undef' in some cases ?	[reply]
Re: Re: Re: Re: advice for reading data from a file by JamesNC (Chaplain) on Jan 18, 2004 at 20:23 UTC
Re: Re: advice for reading data from a file by Anonymous Monk on Jan 18, 2004 at 19:02 UTC
thanks for your answer, actually the code seems to works fine on the files. concerning the 'die' in the sub i need it because if one file is faulty the whole process need to be stopped. in fact i first check all the files type with an eval {} and the sub die in case of an error so i can catch it but i didn't tell about that in my post so thanks anyway :)	[reply]
Re: advice for reading data from a file by pg (Canon) on Jan 18, 2004 at 18:43 UTC
There is not much space left for improvement. But if I am doing this, I probably will not use $., instead just count lines myself, which is not a big deal. Personally I would think (100% personal), using $. reduces maintainability. If one day, you (or someone) decide to modify your code for whatever reason, and in your while loop a second file is involved, your program can be easily broken, as there is only one $. across all files, and the value is only true for the last file handler accessed.	[reply]
Re: Re: advice for reading data from a file by Anonymous Monk on Jan 18, 2004 at 19:09 UTC
thanks for pointing that pg, i think you're right and will get rid of using $.	[reply]
Re: advice for reading data from a file by Roger (Parson) on Jan 19, 2004 at 00:09 UTC
Adding to other monks' comments, I can see two problems with your code: 1) `while (<$fh>) { ...` This will break if the first line of the file is 0, the second line will not be read. 2) `return $fields[2] =~ /.../;` What if the array `@fields` is empty? You will get warnings (assume you had 'use warnings' in your code, or haven't you?) So I would suggest to add more error checking to the code to make it more robust. `sub check_field { my $file_name = shift; open my $fh, "<$file_name" or die "*** ERROR opening '$file_name': $!"; my @fields = (); while (defined (<$fh>)) { if ($. == 2) { chomp; @fields = split /;/; return 0 unless $#fields == 3; last; } } return 0 if $#fields < 0; return $fields[2] =~ /^\d+$/; }` [download]	[reply] [d/l] [select]
Re: Re: advice for reading data from a file by Anonymous Monk on Jan 19, 2004 at 00:39 UTC
Hi Roger, i'm using 'warnings', but thanks for the 'defined' that i've missed :} but plz can you explain me why you are checking $#fields again in the line: `return 0 if $#fields < 0;` as it's already done in the loop (==3) ??	[reply] [d/l]
Re: Re: Re: advice for reading data from a file by Roger (Parson) on Jan 19, 2004 at 00:51 UTC
That will guard against the case when your file has less than 2 lines, and the @fields only gets populated by the second line of the file.	[reply]
Re: Re: Re: Re: advice for reading data from a file by Anonymous Monk on Jan 19, 2004 at 01:04 UTC
Re: Re: Re: Re: Re: advice for reading data from a file by Roger (Parson) on Jan 19, 2004 at 01:30 UTC
Some notes below your chosen depth have not been shown here
Re: Re: advice for reading data from a file by graff (Chancellor) on Jan 19, 2004 at 01:21 UTC
1) `while (<$fh>) { ...` This will break if the first line of the file is 0, the second line will not be read. Hmm. Funny, it doesn't seem to behave that way for me, and I wouldn't expect it to. The magical `while(<>)` statement (with or without an explicit file handle) is actually shorthand for `while( defined( $_ = <> ))` Try it out with a file that has just "0\n" as the first line and anything after that on other lines -- I've tried it a number of ways, and the only way I could get it to stop at the first line was: `while ( <> > 0) ...` [download] which is admittedly the sort of thing that very few people would do inadvertently.	[reply] [d/l] [select]
Re: advice for reading data from a file by davido (Cardinal) on Jan 19, 2004 at 04:25 UTC
I am going to weigh in here a little late. It seems to me that all the bother of setting up a `while` loop is unnecessary if all you're doing is skipping the first line of the file, reading the second, and exiting. I might write such a sub like this: `sub check_field { open my $fh, "<", shift or die "Bleah!\n$!"; <$fh>; # Skip the unwanted line. my @fields = split /;/, <$fh>; close $fh; die "Ick!\n" unless @fields == 4; return( ($fields[2] =~ /^\d+$/) ? 1 : 0 ); }` [download] It's not really a matter of golf, I just like the idea that if we are only reading the first two lines from a file, listing <$fh> twice instead of breaking out of a while loop after the second line is somehow preferable. Also, I think your goal is to return true if `$fields[2]` matches only digits. I've used the ternary operator to ensure that undef never gets returned. Dave	[reply] [d/l] [select]