in reply to Re^2: Read two files and print
in thread Read two files and print

The files are very huge. I tried something like
open FH, '<file1.txt'; @data = <FH>; open FH1, '<file2.txt'; @data1=<FH1>; my $text1 = <<END_TEXT; @data END_TEXT my $text2 = <<END_TEXT1; @data1; END_TEXT1
@data inside <<END_TEXT prints only one row. How can I print entire array inside <<END_TEXT

Replies are listed 'Best First'.
Re^4: Read two files and print
by GrandFather (Saint) on Feb 27, 2009 at 05:11 UTC

    Define 'huge'. For any value of huge over a few hundred megabytes you really don't want to slurp the files into memory! In fact at that size you are getting into file sizes where you should be using a database. Perhaps you better give us a little more information about the size and true nature of the files you are dealing with and the task you actually need to perform.


    True laziness is hard work
Re^4: Read two files and print
by Marshall (Canon) on Mar 01, 2009 at 16:37 UTC
    1. Since you are replying to my comments, I will comment: To print @data, just use: print @data;
    When you "slurped" file1 into memory that would have included the "\n"'s. @data = <FH>; will read all lines from <FH> and put them into the @data list.
    The <<END_TEXT sort of idea will have no place in your code. That was just a way that grandfather embedded a short test file into the code.

    2. Having said that about printing @data, this is NOT what you want to do! grandfather's code reads the text2 file one line at a time and creates a hash table. It does NOT save a verbatim copy of either the text2 or text1 input files into an array!

    3. Create 2 small files, say 100 lines each and get grandfather's code running on your machine. The code will run in a few seconds. Then turn it loose on the full size files that you have. The FIRST STEP before optimizing is to get running code!

    From looking at the code, I doubt that you will see much difference between 100 lines and 10,000 lines in file2. I suspect that this thing will run in much less than 10 seconds. If the program runs within what is acceptable time frame to you, there is probably no need to optimize it.

    4. HUGE is relative! This program algorithm will not slow down appreciably until the size of the hash of file 2 (the smallest file) exceeds what you can have memory resident. I just opened one of my apps that creates a hash table of about 120K entries and sorts/displays in a Tk GUI, takes less than 0.5 seconds and the processing that is being done is FAR more than in your application.

    5. So get working code with small set of data and then report back about problems and size issues when you scale it.