Handling different data types of the same data

blahblah has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

This problem has been bothering me for some time. It feels like it should be possible, but I don't know where to begin.
I have some text data - a csv file for example. I have a subroutine that can parse csv data, say csvparse(). How can I make csvparse() smart so that it can accept csv data from a string passed as a scalar, from a filehandle, or from a file without slurping in the entire file (except in the scalar case, of course) and without making three separate subs - one for each data type? Is this possible?
Here is some stumbling in the dark...

#!/usr/bin/perl -w
use strict;
use CGI;

{
# case 1 - a file
my $file = "csvfile.csv";
my %parseddata = csvparse($file);
}

{
# case 2 - a filehandle
my $uploadedfile = param('uploadfile');
my %parseddata = csvparse($uploadedfile);
}

{
# case 3 - a scalar
my $csvdata = 'first,middle,last,phone,email';
my %parseddata = csvparse($csvdata);
}

# so how can I avoid having three separate csv parsing subs?
# Once it can get at the data, the parsing is the same for all 3 cases
+!

sub csvparse {
   my $data = $_[0];

   if (ref($data) eq 'SCALAR') {
      for (split(/\n/,$data)) {
         # parse...
      }
   } elsif (-f "$data") {
      open(DATA, "$data") or die("Noooo!\n");
      while (<DATA>) {
         # parse...
      }
      close(DATA);
   } elsif () { # detect filehandle???
   }
}
[download]

yuck.
Thanks

Comment on Handling different data types of the same data Download Code

Replies are listed 'Best First'.
Re: Handling different data types of the same data by davido (Cardinal) on May 24, 2004 at 06:47 UTC
Use a two-sub strategy. The first is a sub that gets strings to be parsed out of anything: a filename, a ref to a scalar of data, a ref to an array of data, or a ref to a glob filehandle. Within your sub, do a check on `ref($_[0]);`. Set it up as follows: ref() returns false: The value should be used as a filename. Open the filename and parse the file. ref() returns SCALAR: The value should be used as a reference to a string to be parsed. ref() returns ARRAY: The value should be used as a ref to an array of strings to be parsed. ref() returns GLOB: The value should be used as a filehandle. Read from the filehandle and parse the file. From any of those sources, the first sub will send string(s) to the parsing engine one at a time, which is a second sub. That parsing engine needs to only understand how to parse strings. It doesn't care whether it got that string from a filehandle or a scalar, because that's all handled by the invoking sub. Dave	[reply] [d/l]
Re: Handling different data types of the same data by Zaxo (Archbishop) on May 24, 2004 at 06:48 UTC
I'll start with a warning that you need to untaint this data carefully. This appears to be part of some CGI application, so that goes double. Also, the *DATA handle is reserved for data at the tail of your script, after a __END__ or __DATA__ or Ctrl-Z. You have an error in `if (ref($data) eq 'SCALAR')`. That asks if $data is a reference to a scalar, but then you treat $data as a string. This is a neat problem to solve in perl 5.8+ because you can open a string as a file. The open function is able to take either a handle or a filename on its own. The difficulty with your puzzle is to distinguish between a string that is a filename and one that is csv data. It looks easy to, say, look for commas with a regex or index but that discounts the possibility of unexpected filenames. How about taking the calling convention that a reference to a scalar is data and a string is a filename? That gives you what you seem to be writing towards. You can detect a reference to a file handle with `ref($foo) eq 'GLOB'`. Lexical handles will be of that type. You could also insist that global handles be passed by reference. There is a lot of dwimmerie in dealing with filehandles, so testing is much to be desired for your sub. After Compline, Zaxo	[reply]
Re: Handling different data types of the same data by adrianh (Chancellor) on May 24, 2004 at 07:35 UTC
so how can I avoid having three separate csv parsing subs? One way would be to build an iterator for each different input type and pass that to the parsing routine. Something like this: use strict; use warnings; use Carp; use Scalar::Util qw( openhandle ); sub filehandle_iterator { my $fh = shift; return unless openhandle( $fh ); return sub { my $row = <$fh>; chomp $row if defined($row); return $row; }; }; sub file_iterator { my $filename = shift; no warnings; # to avoid warnings about filenames with \n in return unless open my $fh, '<', $filename; return filehandle_iterator( $fh ); }; sub string_iterator { my $csv_string = shift; my @lines = split( /\n/, $csv_string ); return sub { shift @lines }; }; sub parse_csv { my $csv_input = shift; my $row_iterator = file_iterator( $csv_input ) \|\| filehandle_iterator( $csv_input ) \|\| string_iterator( $csv_input ) \|\| croak 'could not find iterator'; while ( my $row = $row_iterator->() ) { # ... do stuff with row ... print "> $row <\n"; }; }; [download]	[reply] [d/l]