oomwrtu has asked for the wisdom of the Perl Monks concerning the following question:

My question refers back to something I asked about a few days ago... I am trying to print a string from a file (in the form 0001,Nunki 2 1,5847,71%,0.71,4151.37,ThrevenGuard,-18,-26 to a webpage in various forms. You can choose between:
  1. Table or Comma-delimited formats
  2. Displaying unoccupied or not
  3. Coordinate system like x,y or (x,y)
You can see it in action at http://emino.realestateetools.com/ssprog/alpha/planetDiscovered.cgi?display=1&unoccupied=0&coordinate=1. Everything works just fine until I try and display all of the elements in table format (http://emino.realestateetools.com/ssprog/alpha/planetDiscovered.cgi?display=1&unoccupied=1&coordinate=1). It will load the option portion but then stops doing anything. When you click stop, it displays about the first 450 records (out of 1300 or so). How can I clean up this code so it will work with those options selected?
if( $options{"display"} eq "1" ) { print "<table border=\"1\" cellpadding=\"1\"" . "cellspacing=\"1\">\n"; print "<tr><td>ID</td><td>Name</td><td>Size</td>" . "<td>Percent</td><td>Decimal</td><td>Land" . "</td><td>Owner</td>"; if( $options{"coordinate"} eq "1" ) { print "<td>Location</td></tr>\n"; } else { print "<td>X</td><td>Y</td></tr>\n"; } } else { print "ID,Planet Name,Size,Land Percentage,Land" . "Decimal,Actual Land,Owner,"; if( $options{"coordinate"} eq "1" ) { print "Location<br>\n"; } else { print "X,Y<br>\n"; } } for( my $i = 0; $i <= $planets; $i++ ) { split(/,/, $data[$i]); next if( $options{"unoccupied"} eq "0" && $_[6] eq "unoccupied\n" ); print $data[$i] . "<br>\n" if( $options{"display"} eq "0" && $options{"coordinate"} eq "0" ); print "$_[0],$_[1],$_[2],$_[3],$_[4],$_[5],$_[6]" . ",($_[7],$_[8])<br>\n" if( $options{"display"} eq "0" && $options{"coordinate"} eq "1" ); print "<tr><td>$_[0]</td><td>$_[1]</td><td>$_[2]" . "</td><td>$_[3]</td><td>$_[4]</td><td>$_[5]</td>" . "<td>$_[6]</td><td>$_[7]</td><td>$_[8]</td></tr>\n" if( $options{"display"} eq "1" && $options{"coordinate"} eq "0" ); print "<tr><td>$_[0]</td><td>$_[1]</td><td>$_[2]</td>" . "<td>$_[3]</td><td>$_[4]</td><td>$_[5]</td><td>$_[6]</td>" . "<td>($_[7],$_[8])</td></tr>\n" if( $options{"display"} eq "1" && $options{"coordinate"} eq "1" ); delete $data[$i]; } print "</table>" if($options{"display"} eq "1"); print "\n<br><br>time to run: " . (times)[0]; exit;
Thank you very much in advance, you guys have been a lot of help already!

EDIT1: Full source can be found at http://emino.realestateetools.com/ssprog/alpha/planetDiscovered-source.txt. Data file at http://emino.realestateetools.com/ssprog/alpha/data/parsed-all.txt. I tried using 200 records and it output fine, then with 900 and it didn't work, did the same thing as above...

EDIT2: Modified the code to make sure it didn't get messed up because of the long lines.

Replies are listed 'Best First'.
Re: Printing data from a file in various ways
by liverpole (Monsignor) on Oct 12, 2005 at 19:45 UTC
    As Skeeve pointed out, it would be helpful if you could display more of your code.  When I turn on warnings and strict, the following things haven't been defined (at least, within the code fragment you showed):
    • %options
    • $planets
    • @data
    You might also consider trying it on a smaller subset of data first, just to narrow the problem.

    Finally, perhaps you might want to try not wrapping lines, to make the code a little easier to examine (though I admit this is my preference, not everybody's).

Re: Printing data from a file in various ways
by Skeeve (Parson) on Oct 12, 2005 at 19:32 UTC

    Without looking at your code it sounds to me like you run into the usual problem with tables: They can't be shown until all data is examined by the browser.

    The usual solutions:
    1. try to set column widths
    2. (or) split it up into several smaller tables

    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%.+=%;.#_}\&"^"-+%*) +.}%:##%}={~=~:.")&e&&s""`$''`"e
Re: Printing data from a file in various ways
by graff (Chancellor) on Oct 13, 2005 at 04:56 UTC
    I gather you have a large percentage of "unoccupied" elements in your source data, because based on what you say, the big difference in response time is mainly a matter of whether or not this condition fires (and skips the rest of the loop block):
    next if( $options{"unoccupied"} eq "0" && $_[6] eq "unoccupied\n" );
    If that doesn't fire, you end up doing four more "if" tests on every iteration, and then deleting an element from an array (which probably doesn't do what you think it does -- check "perldoc -f delete").

    So there are ways to reduce the overall amount of work that perl is doing in the loop while still getting things done just as well (or even better). I don't know how much execution time would be saved, but I think the following version is at least a bit cleaner and not any harder to maintain than your original.

    As mentioned previously, you haven't shown how you initialize some of the variables in the posted snippet, so I'm going to guess that your "$planets" value was intialized to the number of elements in the "@data" array (i.e. your for loop covers every array element).

    Another thing I have to guess about is that elements in @data either have 7 fields (0-6) with the last one having the string "unoccupied", or else they have 9 fields (0-8). If there are more fields than that, then the code below would be inappropriate.

    my $dispmode = $options{display} * 2; $dispmode |= $options{coordinate}; # bitwise OR # $dispmode values: 0 == display:0 and coordinate:0 # 1 == display:0 and coordinate:1 # 2 == display:1 and coordinate:0 # 3 == display:1 and coordinate:1 # create a distinct print sub for each mode: # (in the second one, the regex places parens around # the last two comma-delimited fields on the line if # the last field is numeric) my @modesub = ( sub { print $_[0], "<br>\n" }, sub { ($_=$_[0]) =~ s/([^,]*,-?[\d]+)$/($1)/; print "$ +_ <br>\n"}, sub { print "<tr><td>" . join("</td><td>",split /,/, $ +_[0]) . "</td></tr>\n" }, sub { print "<tr><td>" . join("</td><td>",split /,/, $ +_[0], 8) . "</td></tr>\n" } ); for ( @data ) { # the following regex will check if this element of @data has # "unoccupied" following the 6th comma, and nothing thereafter: next if( $options{"unoccupied"} == 0 && /^(?:[^,]*,){6}unoccupied$ +/ ); $modesub[$dispmode]->( $_ ); } @data = (); # added this to get rid of the data # though maybe that doesn't really matter? print "</table>" if($dispmode & 2);
    That version does only one "if" test on each loop iteration, economizes the removal of elements from the array (and actually makes the memory available for garbage collection, unlike "delete"), and only uses split when it really has to.

    Also, I think that using @data and the field values on each row as lists (rather than always addressing specific array elements by index numbers) could save some time. (It certainly makes the coding easier.)

    Still, if the proportion of "unoccupied" rows is extremely large, the process will just take that much longer when all those rows are being printed. Maybe you need to allow for paging the output (e.g. 100 rows per page, or whatever)?

    (updated to fix punctuation, and to change from "while (@data)" to "for (@data)" -- I think doing "shift" on each iteration in a while loop is more work than necessary.)

    Another update: I noticed your links to the code and data after I posted this, and was glad to see that I had guessed right. ;-) One extra little detail: it looks like your data file might contain blank lines, but you don't seem to be checking for that -- I don't know if this might be part of the "delay" problem, but just to be sensible, you should read from the file like this:

    @data = grep /\S/, <DAT>; # skip blank lines chomp @data; # remove newline chars
    Final update (sorry...): I tighted up the second "modesub" so it only adds parens if the line ends with digits. Also, I deleted a couple lines of code that had been superceded by updates, rather than just commenting them out.
      This is exactly what I am looking for; it works like a charm. :-D Now, the question is: How did you learn this type of effiecient coding? I have looked for Perl tutorials (I only started using it this summer when I setup my webserver) and all of them are the same old "how do Perl variables work" and such. Perhaps it is a matter of experience... I can at least understand what this code is doing now that I see it, I just have to learn how to write it.

      Actually, I do have three questions:
      1. When $dispmode equals 3 (or display:1 and coordinate:1), it doesn't show the coordinates as an ordered pair like I would like it to. How could I fix this?(sub { print "<tr><td>" . join("</td><td>",split /,/, $_[0], 8) . "</td></tr>\n" })
      2. How would I need to change the code if I wanted to add another display option, such as filtering by name?
      3. If you have time, how would you like to review the code I use to get the data from an HTML page and parse it? (would be much appreciated, I know you might have a whole lot of better things to do :) )
      Thank you very much for all of your help!
        How did you learn this type of effiecient coding?

        It's basically the principle of "informed, constructive laziness": you know what needs to be done, you figure out the minimum number of steps required to get it done, and you structure the program in such a way that you never write a given chunk of code (e.g. an "if" condition or quoted string or any multi-step procedure) more than once. For me, a key step is to document (in coherent, human language) what the code is going to do first, then write the code according to the documentation.

        When $dispmode equals 3 ..., it doesn't show the coordinates as an ordered pair

        I gather you mean it's not putting parens around the "x,y" in the last column of the table. If it's really important to get the parens in there, I guess I'd elaborate that sub a little:

        sub { my @cels = split( /,/, $_[0], 8 ); $cels[7] = '('. $cels[7] .')'; print '<tr><td>'. join('</td><td>',@cels) ."</td></tr>\n" }
        How would I need to change the code if I wanted to add another display option, such as filtering by name?

        You could filter entries as you read from the file, by altering the grep condition that I suggested for skipping blank lines; e.g. if you add a cgi param like "$name_pattern" (which the user can set to a string or leave blank), you can say:

        my $match = ( $name_pattern =~ /^(\w+)$/ ) ? $1 : "\\S": my @data = grep /$match/i, <FILE>;
        (updated to fix typo in sub snippet; also switched to using a regex capture in the second snippet, in accordance with the standard idiom for untainting data)