in reply to Re: Read two files and print
in thread Read two files and print

This is a great idea. I would just add a few comments that might clarify a few things for previous posts.

1. split /\s+/, $line; splits on any whitespace character, this includes space,\f,\r,\n,\t. Since \n is in this set, you don't need to chomp($part2); doesn't hurt but it is not necessary here. The reason in previous post that "\t" didn't work is that you need a regex for the first arg to split./\t/ would have worked but /\s+/ is usually better. The \t idea would result in a \n in $part2 and of course since you can't see these non-printing characters it is possible that there is are some plain spaces in there!

2.The best way to get the 2nd thing from the split is with list slice. my $part2 = (split /\s+/, $line)[1]; Since you don't use $part1, there is no need to assign it. It often occurs that you are working with a line with a bunch of things on it and you just want a couple of them. Using list slice allows you to assign meaningful names to these things like maybe: my($temperature,$city)=(split /\s+/,$line)[3,8];. This is a lot better than say, $line[3] because you don't need any comments to explain that thing 3 means temperature.

Of course here the op probably has some other name in mind for $part2 that would make the code even more clear.

Replies are listed 'Best First'.
Re^3: Read two files and print
by sandy1028 (Sexton) on Feb 27, 2009 at 04:20 UTC
    The files are very huge. I tried something like
    open FH, '<file1.txt'; @data = <FH>; open FH1, '<file2.txt'; @data1=<FH1>; my $text1 = <<END_TEXT; @data END_TEXT my $text2 = <<END_TEXT1; @data1; END_TEXT1
    @data inside <<END_TEXT prints only one row. How can I print entire array inside <<END_TEXT

      Define 'huge'. For any value of huge over a few hundred megabytes you really don't want to slurp the files into memory! In fact at that size you are getting into file sizes where you should be using a database. Perhaps you better give us a little more information about the size and true nature of the files you are dealing with and the task you actually need to perform.


      True laziness is hard work
      1. Since you are replying to my comments, I will comment: To print @data, just use: print @data;
      When you "slurped" file1 into memory that would have included the "\n"'s. @data = <FH>; will read all lines from <FH> and put them into the @data list.
      The <<END_TEXT sort of idea will have no place in your code. That was just a way that grandfather embedded a short test file into the code.

      2. Having said that about printing @data, this is NOT what you want to do! grandfather's code reads the text2 file one line at a time and creates a hash table. It does NOT save a verbatim copy of either the text2 or text1 input files into an array!

      3. Create 2 small files, say 100 lines each and get grandfather's code running on your machine. The code will run in a few seconds. Then turn it loose on the full size files that you have. The FIRST STEP before optimizing is to get running code!

      From looking at the code, I doubt that you will see much difference between 100 lines and 10,000 lines in file2. I suspect that this thing will run in much less than 10 seconds. If the program runs within what is acceptable time frame to you, there is probably no need to optimize it.

      4. HUGE is relative! This program algorithm will not slow down appreciably until the size of the hash of file 2 (the smallest file) exceeds what you can have memory resident. I just opened one of my apps that creates a hash table of about 120K entries and sorts/displays in a Tk GUI, takes less than 0.5 seconds and the processing that is being done is FAR more than in your application.

      5. So get working code with small set of data and then report back about problems and size issues when you scale it.