matth has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I have a problem using a while loop inside a for loop. The for loop does not iterate properly. I am operating in Linux. I have kept some comments in the code to indicate the sort of testing I have been doing. This section of the program is designed to go through a tagged text file and retrieve the lines according to tags (the tags are numbers). These lines have to be retrieved in the iterative order of the tag numbers followed by the order in which they are presented in the input text file. I can not use associated arrays because I am worried about memory requirements. Any help will be much appreciated.
$output_XML_A_open = "output_xml_d.txt"; open (OUTPUT_XML_A_OPEN, "<$output_XML_A_open"); $new_output_xml_a = "new_output_xml_d.txt"; open (NEW_OUTPUT_XML_A, "+>>$new_output_xml_a"); for ($d=1;$d<10;$d++){ while (<OUTPUT_XML_A_OPEN>){ $line = $_; # print "#############"; chomp; if ($line =~ /^(\d{1,10})/){ #print "...............\n"; $key = $1; print "$key\n"; $xml_content = $_; print "does $key = $d"; if ($d =~ /^$key$/){ #print "#########"; print NEW_OUTPUT_XML_A "$xml_content\n"; #if ($line =~ /^(\d{1,10})\s{4}</){ # #text removed #} } } } #close (OUTPUT_XML_A_OPEN); }

Replies are listed 'Best First'.
Re: For loop problem
by Abigail-II (Bishop) on Nov 28, 2002 at 14:47 UTC
    What makes you think the for loop doesn't iterate correctly? It won't do much after the first iteration, but that's because in the first iteration, you plow through $output_XML_A_open (reaching the end), so the body of the while is never entered after the first iteration of the for.

    Abigail

Re: For loop problem
by dreadpiratepeter (Priest) on Nov 28, 2002 at 14:55 UTC
    If I understand what you are doing, the problem is that you are trying to use the same handle to iterate through the file multiple times. That won't work. You need to open and close the file inside the for loop (or play games with seek).
    Some quick comments:
  • use strict
  • check return code from IO operations (like open)
  • You have an handle opened for input named OUTPUT_XML_A_OPEN. That is confusing.
  • while (my $line = <OUTPUT_XML_A_OPEN>) would be less confusing than what you have.
    Hope this helps,

    -pete
    "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."
      Open file statement within the for loop solves the problem. Thanks.
Re: For loop problem
by tommyw (Hermit) on Nov 28, 2002 at 14:52 UTC

    In essence, you're starting at the top of the file, and then just repeating 10 times: read from the current position to the end of the file. After the first time through the loop, you've reached the end of the file (by the nature of your while loop), so the subsequent iterations simply read from the end of the file to the end of the file. Which results in nothing happening. I suspect you need to add last after sending the line to NEW_OUTPUT_XML_A

    To add to the confusion: $line=~/^(\d{1,10})/) looks for a line starting with a number of up to ten digits. Which you then attempt to compare with value of $d. But that's only a single digit (1 to 9), so the vast majority of the time, that's not going to work. Why not look for a line starting with only a single digit? $line=~/^(\d)\D/?

    --
    Tommy
    Too stupid to live.
    Too stubborn to die.

Re: For loop problem
by robartes (Priest) on Nov 28, 2002 at 15:26 UTC
    You are worried about the memory requirements of stuffing the file into a hash, but if the file fits into memory you could stuff it into an array and then use grep to get code which looks a lot cleaner:
    use strict; open FILE, "<untested.txt" or die "Blerch: $!\n"; my @contents=<FILE>; close FILE; open OUTPUT ">outputfile" or die "Hcrelb: $!\n"; for my $d (1..10) { print OUTPUT join $\, grep (/^$d/, @contents); } close OUTPUT;
    Whether your second requirement is fullfilled with this (lines are written in same order as in inputfile) depends on whether or not grep retains preserves ordering in its input. A quick test on my machine here has indicated that it does, but I don't know whether that's just luck or whether grep is supposed to do it like that. Also note that the above code is untested.

    And to top it all off, you can do this even if your file does not fit into memory: use Tie::File to tie the @contents array to the file and away you go.

    CU
    Robartes-

    Update: Clarified things a bit.

Re: For loop problem
by ehdonhon (Curate) on Nov 28, 2002 at 14:56 UTC
    for ($d=1;$d<10;$d++){
    That's going to iterate $d from 1 to 9. Its valid syntax, but its easier to read, and more perlish this way:
    for my $d ( 1..9 ) {

    if ($line =~ /^(\d{1,10})/){
    This is going to evaluate to true if $line contains a string of numbers. It will assign up to the first 10 consequitive numbers into $1.

    if ($d =~ /^$key$/){
    This is going to evaluate true if the string of numbers is contained within $d. In other words, this is going to evaluate to true if the very first string of numbers that you found in $line was a single digit between 0 and 9. So, any of the following lines are guaranteed to never match:

    this is my 023 string
    hello 10
    hi there 0
    how are you 03
      Thanks for your advice which has raised a point and a question in my mind. Point: The tags are on the inner most left side of the text file. I need $d and $key to be an exact numeric match. Question: Does the if ($d =~ /^$key$/){ not cover this in every circumstance? Point: I tried to do a numeric match but the $key is treated as a string.
        Something like this would probably make more sense (if I understand your situation correctly):
        open( OUTPUT_XML_A_OPEN, $output_XML_A_open ); open( NEW_OUTPUT_XML_A, "+>>", $new_output_xml_a ); for $d ( 1..9 ) { seek( OUTPUT_XML_A_OPEN, 0, 0 ); while (<OUTPUT_XML_A_OPEN>) { chomp; # I'm leaving out a lot of your debugging stuff if( /^$d\D/ ) { # does this line start with "$d"? print NEW_OUTPUT_XML_A; } } }

        As I understand it, you are trying to find lines in the input file that begin with single-digit numbers, and print these to an output file in sorted order (all the "1" lines first, then all the "2" lines, etc), without worrying about any sub-sort ordering (all the "1" lines can be in any order).

        In that regard, I wonder why you open the output file for "append" access (">>") -- and why you open it for both read and write access (using "+"), when you don't really need to read it here?