in reply to How can I read multiple lines starting with the same number and put in to a nested array and print it to a file?

Why do you want to process all the records of one type first rather than build 9 arrays as you move through the file and then sort and output then at the end?

I think the missing peices of information from your question are:

a) How big is the input file?

b) Is the input file pre-sorted by the first field? ie. will all the "5 x x x x x x x x" lines be contiguous in the input file?

If the input file is not pre-sorted and especially if it is large, then it would probably be faster to use the system utility to pre-sort the input file, or a copy of it if you need to preserve the original.

If your reasons for doing one record type at a time is because each subset if very large, then it may well be quicker to process the input file to 9 output files in one pass and then either reload the 9 files sort and output again, or use the sort utility on them.

A clear picture of the scale of the problem would make the choice of solution easier.

  • Comment on Re: How can I read multiple lines starting with the same number and put in to a nested array and print it to a file?

Replies are listed 'Best First'.
Re: Re: How can I read multiple lines starting with the same number and put in to a nested array and print it to a file?
by Anonymous Monk on Jun 24, 2002 at 07:55 UTC
    The initial input file can be 3 GB. I have a NASTRAN (Finite Element Analysis software) output file which is pre sorted. The first number in each line represent an element. Element number can be aywhere between 1 to 9999999. They are not consistent either. The example above was just a simplified example. I would not know exact element numbering. So I might have 10 lines starting with element ID 12000. The next element id might be 13450 and I might have 10 lines starting with that number. So I have a huge file, and memory is crucial. I need to have an inteligent algorithm that goes through each line and finds the lines starting with the same number (which I would not know what they are) and put only those ones in to an array. I do not need to create different array for each element number. Once I print the sorted nested array, I can use the same array name and put new lines starting with the same numbers into this array. I am not sure if I am making sense. But thanks for listening.

      This is untested and bad Perl (I'm new to it) but the algorithm should be clear and work ok. If your lucky, one of the experts here will be so appalled by my Perl that he will step in a clean it up or offer you better.

      # somewhere to remember the records we processing my $lastFirstNum = ""; my @nums; # work array while (<>) { # get the first number from the line my $firstNum = split /\s/, $_, 1; # prime the pump if its the first time through $lastFirstNum = $firstNum if $lastFirstNum = ""; if ( $firstNum eq $lastFirstNum ) { # its still the same type so save it push @nums, $_; next; # skip to next record } # we found the last one sort the array @nums = map {[ reverse sort @$_ ]} @nums; #open output using the first number as the name open( FH, ">$lastFirstNum" ) or die "Can't open $lastFirstNum: $!" +; print @nums; close( FH ) or die "Couldn't close $lastFirstNum: $!"; # the number changed $lastFirstNum = $firstNum; undef @nums; #clean the array push @nums, $_; # push the new record }

      Update: corrected my own (first) obvious mistake.