Chady has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys..

I'm stuck with a flat file | seperated fields database which is huge actually. and I want to generate HTML files based on the subcategory, and stuff all the other information for each subcategory in the specific file..

So my bet was to read the file line by line and create subcategory-named-files whenever a new subcategory is encountered, because the subcategories aren't sorted.

anyway, can I create filehandles that evaluate from a variable? and if I can, I might end up with about 50 filehandles opened at the end of the operation which I have to close, can I close every opened filehandle? or do I have to specify a close() for each one?

Here's an example of the data file, I trimmed down the info and messed it up.

ID|Category|SubCategory|Code|Description|Picture ex: 0|OFFICE EQUIPPMENT|PEN|PDS01|an ordinary pen|pen.gif 1|OFFICE EQUIPPMENT|PEN|PDS02|another pen|pen2.gif 2|OFFICE EQUIPPMENT|PAPER|PA003|white sheets to write on|paper.gif 3|OFFICE EQUIPPMENT|PEN|PDD50|the greatest pen|pen50.gif 4|OFFICE EQUIPPMENT|PEN|PDS01|an ordinary pen|pen.gif

He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life.

Chady | http://chady.net/

Replies are listed 'Best First'.
Re: About Filehandles
by Masem (Monsignor) on Jun 25, 2001 at 15:21 UTC
    While reading in all at once and parsing out to file later is a better solution, note that you can toss filehandles around using the IO::File module, and thus have a hash of filehandles which you can then use thusly:
    use IO::File; my $fh = new IO::File; my %hash = ( file1=>$fh ); $hash{ file1 }->open(">subcat.txt") or die $!; print $hash{ file1 } "Text here\a"; $hash{ file1 }->close();

    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
Re: About Filehandles
by ChOas (Curate) on Jun 25, 2001 at 15:15 UTC
    Hey!

    A two step option, that I would use: (pseudo-code)
    open inputfile read file into a hash, using the SubCatagory as the key (I dunno if you need ALL the data, but I`m sure you know how to stuff it in the hash at the SubCat Key (make a list ref. or whatever you want.) close inputfile iterate over the keys in your hash (unique subcatagories) open Subcatagory file ...Do your thing on the file, using the data in your hash... close SubCatagory file
    Only one filehandle open at a time...

    Hope this helps...

    GreetZ!,
      ChOas

    2B||!2B==?;
    print "profeth still\n" if /bird|devil/;
Re (tilly) 1: About Filehandles
by tilly (Archbishop) on Jun 25, 2001 at 16:01 UTC
    Your approach may run out of filehandles. If it does then you may want to use FileCache to exceed the usual limits on filehandles.
Re: About Filehandles
by particle (Vicar) on Jun 25, 2001 at 16:13 UTC
Re: About Filehandles
by mattr (Curate) on Jun 25, 2001 at 16:35 UTC
    I'm going to leave the filehandle method to others. I'd include category name unless you are positive that the same subcategory name is never used for two different categories.

    You could also tie a hash to disk, using one DBM implementation or another depending on how you want to access your data (MLDBM is popular here).

    But I think the best way would be to try using DBD::RAM. It seems to have been provided just for you! I tried it with your data for fun and it works great, letting you use SQL commands without a database server. Now I don't know how much memory this will want if you sort with "order by", and speed will probably be so-so but useable.

    Of course if by huge you mean *huge* for raw read speeds you might want to pop over to mysql.com and get a powerful, free database server. In that case I would skip DBD::RAM and just parse the file and insert each line into the database, then run an SQL query on the resulting database table.

    #!/usr/bin/perl -w use strict; use DBI; use DBD::RAM; my $dbh = DBI->connect('DBI:RAM:','usr','pwd',{RaiseError=>1}); $dbh->{f_dir} = '.'; # (default), or your path: '/path/to/files' $dbh->func([ [ 'items', 'PIPE', 'ramdata', {col_names => 'id,category,subcat,code,description,picture' } ], ],'catalog'); my $query = "select * from items where subcat='PEN' order by code"; my $sth = $dbh->prepare($query)|| die print $DBI::errstr; $sth->execute || die print $DBI::errstr; my @row; while ( @row = $sth->fetchrow_array ) { print join(" ",@row),"\n"; # or try this.. print "<img src=\"$row[5]\">\n"; } # DBD::RAM lets you access a file with SQL. PIPE separated # is a built-in type which lets you read from one file and # write to another one without storing the whole thing in memory. # Headings can come from the top of the file if you have a line # containing them. # You don't need a separate database server, just DBI, DBD::RAM, # and SQL::Statement. It can access remote files, and # use different kinds of tables at the same time.
    Good luck,

    Matt

    NEW, MORE: Check out this node. Some of the things mentioned above are illegal, like printing to a hash of filehandles without doing something sneaky (adding curly brackets and have Perl treat it as an object). Also you can't put such a hash into the diamond operator: $got = <$fd[0]> is illegal. Sorry for the braindead post yesterday. I researched and here are some ways to get filehandles to work in a hash.

    I have not worked on using globs yet but this is another thing you could do; using Symbol::gensym is lightweight compared to IO::Handle, and you can store a glob in the hash instead of an IO::Handle object, so you could use the glob in readline where you would normally use the diamond operator. Info about globs (working code) would be useful if someone would like to add.

    #!/usr/bin/perl -w use strict; use IO::File; my %in; my %out; $in{b} = new IO::File; $out{b} = new IO::File; $in{b}->open("<ramdata") || die $!; $out{b}->open(">ramdata2") || die $!; my $fi = $in{b}; my $fo = $out{b}; &v1; # &v2; # this also works close($in{b}); close($fo); sub v1 { # works while (<$fi>) { print $fo "-> $_"; } } sub v2 { # works my $x = $in{b}; while (<$x>) { print {$out{b}} "-> $_"; # sneak by print: treat as object } } sub nogoods { # these doesn't compile # while (<{$in{b}}>) { # no obj allowed in diamond # ... } # while (readline($in{b})) { # bad: storing obj not fh glob # ... } }
    Cheers,

    Matt

Re: About Filehandles
by CharlesClarkson (Curate) on Jun 25, 2001 at 17:56 UTC

    I like the autovivification in 5.6.0:

    #!/usr/bin/perl use strict; use warnings; { my %file_handle; while (<DATA>) { my ($id, $category, $sub_category, $code, $description, $pictu +re) = split /\|/; # massage data into a printable string $file_handle{$sub_category} ||= get_handle($sub_category); print {$file_handle{$sub_category}} "something\n"; } } # As %file_handle goes out of scope # all file handles close implicitly sub get_handle { my $file_name = shift; open my $fh, '>', $file_name or die "Cannot write to $file_name: $ +!"; return $fh; } __END__ 0|OFFICE EQUIPPMENT|PEN|PDS01|an ordinary pen|pen.gif 1|OFFICE EQUIPPMENT|PEN|PDS02|another pen|pen2.gif 2|OFFICE EQUIPPMENT|PAPER|PA003|white sheets to write on|paper.gif 3|OFFICE EQUIPPMENT|PEN|PDD50|the greatest pen|pen50.gif 4|OFFICE EQUIPPMENT|PEN|PDS01|an ordinary pen|pen.gif

    HTH,
    Charles K. Clarkson
Re: About Filehandles
by PetaMem (Priest) on Jun 25, 2001 at 15:17 UTC
    It all depends whether the file is that big, that it doesn´t fit into your computers memory (then I have a slow and ugly sollution) or if it does (fast to write and neat):

    Warning! code not verified and is only a concept

    my @stock; open FILE, "stockfile"; while(<FILE>) { push @stock,$_; } close FILE; while(@stock) { foreach $line (@stock) { @items = split ´|´, $line; open $item[2], ">>some_filename"; put_items_into_it; close $item[2] delete_that_line_from_stock; } }
    Ciao
      while(<FILE>) { push @stock,$_; } # Would be better as... @stock = <FILE>;
      Why wrap a while around a foreach here?
      while(@stock) { foreach $line (@stock) {
      A straight foreach will do just as well. Of course in that case there is no need for @stock at all...
      while (my $line = <FILE>) { # do stuff }
      Additionally this will not be memory intensive because it will only store one line in memory at a time. If you are attempting to keep only one filehandle open at a time via...
      open $item[2], ">>some_filename"; put_items_into_it; close $item[2]
      You might be better off using checking if the next item to be written out goes into the same FILE that is already opened. No reason to close it and open it again immediately.
      my $cat = undef; while (my $line = <FILE>) { @items = split('|', $line); if ($cat ne $items[2]) { close OUTFILE if (defined $cat); open OUTFILE, ">>some_filename"; $cat = $items[2]; } print OUTFILE ## Whatever ###; } close OUTFILE;
      delete_that_line_from_stock;
      Is not neccessary, since there is no @stock anymore.
Re: About Filehandles
by perchance (Monk) on Jun 25, 2001 at 15:26 UTC
    I'd recommend using the IO::File package. You can easily use it to handle an array of filehandles.
    For example:

    my $fh = IO::File->new(">$filename"); print $fh "Hello\n"; close($fh) or die "Can't close\n";

    --- So Fast, So Numb

Re: About Filehandles
by bwana147 (Pilgrim) on Jun 25, 2001 at 16:16 UTC

    I'd store all the file handles in a hash, indexed by subcategory. When all is over, you then can close them, foreach value in the hash.

    my %fh; # hash of filehandles while ( <> ) { # do something to find out the $subcat(egory) # and open the file if not done already $fh{$subcat} || open($fh{$subcat}, ">$pathtofile" or die "Aaargh: $!"; # now you can print stuff to the file print { $fh{$subcat} } @stuff; } # Close down everything close $_ foreach values %fh;

    --bwana147

Re: About Filehandles
by Anonymous Monk on Jun 25, 2001 at 22:44 UTC
    How about first sort the file, then you can write one subcat after the other. No more than one filehandle open!
Re: About Filehandles
by runrig (Abbot) on Jun 25, 2001 at 23:57 UTC
    Store your filehandles in a hash as already suggested. Then if you undef the hash, all of the files should close.