derek3000 has asked for the wisdom of the Perl Monks concerning the following question:

if I did it in a more idiomatic way. Can any monks give me some suggestions to make it less awkward? Also, do I need to change anything if I want to run it under mod_perl?

The actual paths are weird because they are test files. And I'm running it on win32. Valid files and their short names are stored in the file that $links contains. This ensures that people can't try to node=../../../etc/passwd. The $is_script thing skips printing a header if the file contains java script, since most of that stuff would be in between the head tags.


Thanks,

derek3000

#!//Hebe/I$/perl/bin/perl -w # #index.pl #fetches content of link #and prints it out along with #consistent headers and footers. use strict; #like always use CGI; #duh use CGI::Carp; #now we can actually find error messages my $links = 'h:\perl scripts\links.txt'; my @lines; my $q = new CGI; my $node = $q->param('node'); my $is_script = $q->param('script') || 'no'; #list of valid filenames and handles(index) goes here my %known_files; my %top_level; populate_hash($links); #check to see if input is valid if(exists($known_files{$node})){ open(NODE, $known_files{$node}) or die "Can't open file. Valid input though. $!\n"; while(<NODE>){ push(@lines, $_); } close NODE; } else { die "Don't fool around with the url, punk.\n"; } #do the output if($is_script eq 'yes'){ middle(); bottom(); } else { top(); middle(); bottom(); } #Subs under here #-------------------------------------------------------- #------------------------------------------- #gets hash keys and paths for valid files, #also populates second hash for menu sub populate_hash{ my $linkfile = shift; open(LINKS, $linkfile) or die "Can't get file for fresh links! $!\n"; while(<LINKS>){ (my $KEY, my $VALUE) = split /=/; $known_files{$KEY} = $VALUE; if($KEY =~ /announcements|links|news|info/i){ $top_level{$KEY} = $VALUE; } } close LINKS; } #------------------------------------------- #prints html header along with standard menu, #and soon a graphical header. sub top{ print $q->header('text/html'); print $q->start_html(-title=>$node, -author=>'schmidtd@co.delaware.pa.us', -BGCOLOR=>'white'); menu(); } #------------------------------------------ #this will probably stay the same. #it prints out the page-specific content. sub middle{ foreach(@lines){ print $_; } } #------------------------------------------ #prints out anything we want on the bottom, #such as a redundant navigation scheme, #or a link to the disclaimer, or both, #when we get it. sub bottom{ print $q->end_html; } #------------------------------------------ #prints out menu from hash of files. # sub menu{ print "|"; foreach(sort keys %top_level){ print $q->a( {href=>"h:\\perl scripts\\index.pl?node=$_"}, "$_"); print "|"; } }

Replies are listed 'Best First'.
Re: I know this code could be better...
by petdance (Parson) on Jun 20, 2001 at 18:17 UTC
    while(<NODE>){ push(@lines, $_); }
    In list context, the diamond operator returns the entire file, so you can avoid the loop:
    push( @lines, <NODE> );

    In if statements, pull out the common code, so that your code shows the differences, and the reader doesn't have to figure out what's different:
    if($is_script eq 'yes'){ middle(); bottom(); } else { top(); middle(); bottom(); }
    is better as
    top() unless ($is_script eq "yes"); middle(); bottom();

    xoxo,
    Andy

    %_=split/;/,".;;n;u;e;ot;t;her;c; ".   #   Andy Lester
    'Perl ;@; a;a;j;m;er;y;t;p;n;d;s;o;'.  #   http://petdance.com
    "hack";print map delete$_{$_},split//,q<   andy@petdance.com   >
    
Re: I know this code could be better...
by tadman (Prior) on Jun 20, 2001 at 18:42 UTC
    1. As petdance pointed out, your file reading routine can be simplified:
           @lines = <NODE>; </CODE>
    2. Further, you can directly insert into a hash from your split, because in a sense a hash is just a fancy list:
           %known_files = map { my($k,$v)=split(/=/) } <LINKS>;
    3. Post-processing would allow you to figure out the 'top_level' stuff:
      foreach (keys %known_files) { if(/announcements|links|news|info/i) { $top_level{$_} = $known_files{$_}; } }
    4. When you want to print an array, you just print it. You don't need to do anything special.
      sub middle { print @lines; }
Re: I know this code could be better...
by arturo (Vicar) on Jun 20, 2001 at 18:43 UTC

    I don't like code that updates a global value when it's not necessary. I like to be able to see, easily, where information comes from. This results in more maintainable code (you know where to look when something goes wrong).

    That said, I'd rewrite your populate_hash to return a hash, or better still (for performance reasons), a reference to a hash. You then set %top_level to the return value of that subroutine. It's as easy as having the first line of the sub be:

    sub populate_hash { my %hash; # populate hash \%hash; }

    Note this returns a reference to a hash, which means you have to get a little fancier when you call the subroutine (or modify your code to work with the hash reference, but I foresee less hair-pulling if you keep the hash).

    my %top_level = % { populate_hash() };

    The "extra" %{} parts tell the interpreter that it should set %top_level to the hash which the return value of populate_hash refers to.

    I swear that makes sense in some language. =) For more on references, see perldoc perlref and perldoc perlreftut. Once you get the hang of working in references, you see all sorts of new solutions to data-manglement.

    HTH

    perl -e 'print "How sweet does a rose smell? "; chomp ($n = <STDIN>); +$rose = "smells sweet to degree $n"; *other_name = *rose; print "$oth +er_name\n"'
Re: I know this code could be better...
by larsen (Parson) on Jun 20, 2001 at 18:26 UTC
Re: I know this code could be better...
by toma (Vicar) on Jun 20, 2001 at 19:15 UTC
    To run under mod_perl you should move all the initialization code to a BEGIN block. This BEGIN code is then be run the first time the Apache daemon runs the program, and remembered for subsequent runs. If you had cleanup to do when the daemon dies, put it in an END block. I don't see any cleanup code in your program, though.

    Your program works with two files for each run, one is the index and the other is the content. Under mod_perl the index, at least, should be held in RAM and you will be down to one file read per page. If you have enough RAM you should consider keeping both files in RAM. You could have a hash $file_content{$node} that stores a file per node as each file is read. This way the daemon would only read each file once.

    To further speed up your code, eliminate as many print statements as possible. For example, change:

    print $q->header('text/html'); print $q->start_html(-title=>$node, -author=>'schmidtd@co.delaware.pa.us', -BGCOLOR=>'white');

    to

    print $q->header('text/html'), $q->start_html(-title=>$node, -author=>'schmidtd@co.delaware.pa.us', -BGCOLOR=>'white');

    This uses ',' to separate the arguments to print, rather that a '.' since the comma is faster than string concatenation.

    The code goes to the trouble of reading the file into an array, and then printing it out one array element at a time. You don't need to do this. Instead, read the whole file into a single string and print the string.

    If you don't want to follow my advice on not using an array, at least change

    foreach(@lines){ print $_; }
    to something like:
    my $content; foreach(@lines){ $content .= $_; } print $content;
    or a similar statement using join
    print join "\n",@lines;

    There are some other speedup tips at a previous node that I wrote on this topic.

    It should work perfectly the first time! - toma

Re: I know this code could be better...
by jreades (Friar) on Jun 20, 2001 at 18:50 UTC
    my $links = 'h:\perl scripts\links.txt';

    Since this is apparently a constant, you might consider rewriting it as:

    my $LINKS = 'h:\perl scripts\links.txt';

    But if you want to be rigorous and good about preparing for when your script needs to do everything and run the kitchen sink, you might go so far as to write:

    sub LINKS { return 'h:\perl scripts\links.txt'; }

    While your program remains simple , I believe that this will be inlined by the Perl compiler (so no performance hit), but if your program grows, such that lines might change depending on the context, you only have to rewrite the subroutine and everything else will work as expected.

    Actually, looking at your middle sub, I can't see any reason why you wouldn't just suck the file into a scalar and save yourself the overhead of array allocation:

    undef $/;

    To make things more reader-friendly and more extensible it's also a good idea to pass variables to your subs rather than simply calling variables declared elsewhere:

    sub middle { my @output = @_; print while @output; }

    This sub now only makes the assumption that it should print whatever you've passed it... so now, it's re-usable and should probably be renamed print_to_user or some such.

    DISCLAIMER: I've been stuck as a Java programmer for the past few months so my Perl's getting more than a little rusty (trying to get involved again in Perlmonks), so not all of this stuff may perform as advertised. If so, then I apologize.

      Instead of
      sub LINKS { return 'h:\perl scripts\links.txt'; }
      you're better off to be more Perlish with constant.pm:
      use constant LINKS => 'h:\perl scripts\links.txt';

      xoxo,
      Andy

      %_=split/;/,".;;n;u;e;ot;t;her;c; ".   #   Andy Lester
      'Perl ;@; a;a;j;m;er;y;t;p;n;d;s;o;'.  #   http://petdance.com
      "hack";print map delete$_{$_},split//,q<   andy@petdance.com   >
      

        Yes, but consider a change needing to be made down the line:

        Q: "Hey petdance we're porting this script to Unix, can you make it run there too?"

        A: "Sure, I'll just change the constant."

        Q: "Hey what did you change in that script so that now it doesn't work on NT anymore?"

        A: "Hmmmmm"

        Ok, so the example is a stretch, but imagine the solution using a sub instead of a constant.

        sub LINKS { if (sub_to_determine_OS() =~ /(Unix|Linux)/) { return "/u/jreades/links.txt"; } else { return "U:\jreades\nt_links.txt"; } }

        And all of that requires exactly no changes to the rest of your application.

        Yes, it is paranoid, but that doesn't mean they're not out to get you.

      I generally like to pass in locations with either environment variables or thru command line options. Saves on going back through all your code when the directory structure changes... Although some may disagree with me :)
Re: I know this code could be better...
by Jonathan (Curate) on Jun 20, 2001 at 19:56 UTC
    Regarding running under mod_perl, there are a few gotchas (well they got me anyway) Look here for mod_perl issues
Re: I know this code could be better...
by Anonymous Monk on Jun 20, 2001 at 21:13 UTC
    What all of the responders missed: In your intro you say "The $is_script thing skips printing a header if the file contains java". I thought "Huh? Does he mean only the HTML- or the HTTP-Header too?". After looking at your code I saw he means the HTTP-Header too. So your program gives 500 Internal Server Error for all scriptfiles.
Re: I know this code could be better...
by derek3000 (Beadle) on Jun 22, 2001 at 00:15 UTC
    Thanks to all the monks for your help. This made me improve my code and make the functions more generic. What's the bad news? After putting a lot of work into this (I'm still a newbie, so it takes me a while), the brass is so pro-microsoft that they don't want anything that isn't asp or jsp on our intranet. It was a good experience though. Especially since I decided to break down and use references--that definitely taught me a lot.

    As far as the http header thing, the idea is that if it is a script file, it reads all of it in, which would include the http header. the regular files would be coded to start and end at the body tags.


    Thanks again,
    Derek