can't read URL from tab delim

djbryson has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: can't read URL from tab delim by Corion (Patriarch) on Apr 17, 2007 at 15:48 UTC
The code you posted doesn't have 17 lines, so the code is likely not responsible for the error messages you posted. You are reading in a file but you aren't stripping the newlines off the lines read in. Hence `$url` will have a newline at the end. It is possible but unlikely that your files have names with a newline at the end, so you better strip the newlines after reading the files: `chomp @file;` [download] As a general tip, never just print out a variable for debugging - always print it out enclosed in some delimiters so you can easily spot leading or trailing whitespace: `print "Url is '$url'\n";` [download]	[reply] [d/l] [select]
Re^2: can't read URL from tab delim by Fletch (Bishop) on Apr 17, 2007 at 16:22 UTC
Taking that tip further, you might want to get in the habit of using one of the data structure serialization modules (Data::Dumper, YAML::Syck) to print out values in debug messages. That way you don't have to go back and add more debugging scaffolding when you have to start tracking down what exactly `ARRAY(0xdeadbeef)` contains.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: can't read URL from tab delim by TGI (Parson) on Apr 17, 2007 at 17:32 UTC
use strict; use warnings; # File containing review definitions my $DATA_FILE = 'ReviewUpdatesMarch2007'; # Names of fields in data file my @DATA_FIELDS = qw( owner new_date url ); # Validation routines for each field; my %VALIDATE_FIELDS; { # Limit scope of disabled warnings; no warnings 'uninitialized'; %VALIDATE_FIELDS = ( # owner must be one or more word characters plus whitespace owner => sub { my $owner = shift; return $owner =~ /^[\w\s]+$/; }, # new_date must be 6 digits broken into pairs by slashes or dashe +s new_date => sub { my $date = shift; return $date =~ /^\d\d[\/-]\d\d[\/-]\d\d$/; }, # url must be all word characters or :/&?+# url => sub { my $url = shift; return $url =~ /^[\w:\/&?+#]+$/; }, # this is not really a good way to use to validate urls. # there is probably a cpan module that will do so correctly. # but I am too lazy to find it for you. ); } my @reviews; # Review data as hash references # Uncomment this to pull data from file #open ( FILE, '<', $DATA_FILE ) # or die "Error opening data file $DATA_FILE - $!"; # Parse lines and store in @reviews ITEM: # Uncomment this to pull data from file #while ( defined my $item = <FILE> ) while ( defined( my $item = <DATA> ) ) # Delete this to pull data from + file { chomp $item; # Failing to chomp your input may have been the cause +of your error my %item; @item{ @DATA_FIELDS } = split( /\t/, $item ); foreach my $field ( @DATA_FIELDS ) { unless( $VALIDATE_FIELDS{$field}->( $item{$field} ) ) { warn "Invalid data in field '$field' from line '$item'\n"; # Skip bad data next ITEM; } } push @reviews, \%item; } # Uncomment this to pull data from file #close FILE # or die "Error closing $DATA_FILE - $!\n"; # Uncomment to dump your data table for debugging purposes. #use Data::Dumper; #print Dumper \@reviews; foreach my $review ( @reviews ) { my $url = $review->{url}; print "$url\n"; my $slurp = read_file( $url ); } sub read_file { # do stuff here # this is probably where your error is. } __DATA__ Good Owner 12/12/23 good_url Bad!!!Owner 22/22/22 good_url Good Owner2 BAD DATE good_url Good Owner3 12/12/23 bad url Good Owner4 12/12/23 good_url [download] TGI says moo	[reply] [d/l]