Seventh has asked for the wisdom of the Perl Monks concerning the following question:

Greetings monks!

I'm in Iraq right now, and trying to work out a quick script to pull some information out of files for me. As such, my reading and web time is limited, so I was hoping that you folks could help me out.

Basically I have a file, inside is a bunch of text. Within it, are lines like this:
# STANDARD This is some text that describes function seventhdoesnotgetenoughsleep

Probably 20 or so instances of stuff like that. A description, then a function name. What I need to do is open the file, pull out the two lines as a pair (I assume two arrays?) and then spit 'em back out into another file with HTML code around them.

So basically:

- Open file.name
- Look for instances of # STANDARD -text text text-
- Look for the following function
- Group 'em, repeat for each
- Creat a new file named *.html, and format the output accordingly (the HTML is the easy part, I think!)

Sorry to lay it all on you guys, but there isn't much time to read up over here, any help at all would really be appreciated! Thanks!

Edit by holli: added code tags around sample data

Replies are listed 'Best First'.
Re: Newbie question, advice appreciated
by dragonchild (Archbishop) on Sep 27, 2005 at 19:52 UTC
    You will actually want to use an array of hashes. Parallel arrays are notoriously error-prone, so avoid them whenever possible.
    open FH, 'file.name' or die "Cannot open 'file.name' for reading: $!\n"; my @items; while (<FH>) { chomp; my @line = split; # Remove "#" and "STANDARD" shift @line; shift @line; # I assume the function name is the last item in the list and has +no spaces in it my $function = pop @line; push @items, { function => $function, description => join( ' ', @line ), }; } close FH; foreach my $item (@items) { # $item->{description} is the description # $item->{function} is the function name }
    That will even keep them in the order found in file.name

    Now, if you need to sort them, do so as such:

    # If sorting ascending asciibetically. my @sorted_items = sort { $a->{description} cmp $b->{description} } @i +tems;

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

      This is all good, but I'd use a regex to dice up the line rather than split and all that shifting, popping, and joining.

      my ($description, $function) = /^#\s?STANDARD\s+(.*?)\s+(\S+)\s*$/;
      -sauoq
      "My two cents aren't worth a dime.";
      
Re: Newbie question, advice appreciated
by injunjoel (Priest) on Sep 27, 2005 at 19:59 UTC
    Greetings all,
    One other suggestion is to use HTML::Template, if you have access to CPAN that is. Im not sure how you want to handle the lines but here is a quick mockup that may or may not work for your purposes.
    #!/usr/bin/perl -w use strict; use HTML::Template; my $output_template = qq* <table border="0" width="100%" cellspacing="1" bgcolor="#000000"> <tr bgcolor="#cccccc"> <td>Function Name:</td><td>Description</td> </tr> <TMPL_LOOP NAME="functions"> <tr bgcolor="#ffffff"> <td valign="top"><TMPL_VAR NAME="name"></td><td valign="top">< +TMPL_VAR NAME="desc"></td> </tr> </TMPL_LOOP> </table> *; my @output = do{ local @_; #you will need to replace the DATA handle with #one to your file of interest. my $string = do{ local undef $/; <DATA> }; while($string =~ /\s?STANDARD\s+(.*?)\nfunction\s(\S+)\n?/mg){ push @_, {name=>$2, desc=>$1}; } @_; }; my $tmplt = HTML::Template->new(scalarref=>\$output_template,die_on_ba +d_params=>0); $tmplt->param({functions=>\@output}); print $tmplt->output(); exit; __DATA__ blah blah blahblah blah blah blah blah blahblah blah blah # STANDARD This is some text that describes function seventhdoesnotgetenoughsleep # STANDARD This is some describing text function seventhgetsomesleep blah blah blahblah blah blah # STANDARD This is some description function seventhgethomesafe blah blah blahblah blah blah # STANDARD This is some text ...nuff said function seventhwhyiraq blah blah blahblah blah blah

    the output is
    Function Name:Description
    seventhdoesnotgetenoughsleepThis is some text that describes
    seventhgetsomesleepThis is some describing text
    seventhgethomesafeThis is some description
    seventhwhyiraqThis is some text ...nuff said


    Now in your code you will need to change the <DATA> part to read in your file and as dragonchild illustrated, TEST IF OPEN WORKED! Sorry for the caps but its a necessary step. Also you will need to open another handle (and check if it opened) to your output file for writing. Is that what you were thinking?
    I hope that helps



    -InjunJoel
    "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo
Re: Newbie question, advice appreciated
by sk (Curate) on Sep 27, 2005 at 19:56 UTC
    I am not sure if i completely understand the spec. Here is my shot at this. I have put the data at the end so you can see if it is structured in that way.

    #!/usr/bin/perl -w use strict; # open(DATA,"file.txt") or die $!; # I am using DATA handle but you c +an open a file to read contents too. my %functions; while (<DATA>) { chomp; my @functxt = (); my $desc = (); if (/STANDARD/) { @functxt = split /\s+/; $desc = <DATA>; $functions{pop(@functxt)} = $desc; # assuming last word is +function name } } print +($_, " : ", $functions{$_}, $/) for (keys %functions); __DATA__ # STANDARD func1 func1 does something funny blah blah ok junk text # STANDARD func2 func2 does something useful # STANDARD func3 func3 is a math function

    output

    func1 : func1 does something funny func3 : func3 is a math function func2 : func2 does something useful
      @functxt = split /\s+/;

      Usually, that's not the pattern you want to use with split. Given the data format specification we were given in the above problem, it wouldn't make a difference, but split ' ' (or just split with no arguments at all) is usually what is really wanted. The difference is that \s+ can produce a null field when there is leading whitespace.

      -sauoq
      "My two cents aren't worth a dime.";
      
Re: Newbie question, advice appreciated
by graff (Chancellor) on Sep 28, 2005 at 01:13 UTC
    It seems that an important detail was lost in your post -- it would not have been lost if you had placed "<code>" and "</code>" around your data sample.

    Viewing the html source for the page, I see that the data sample really looks like this:

    # STANDARD This is some text that describes function seventhdoesnotgetenoughsleep
    This clarifies your request quite a lot. Here's a simple solution:
    my %funcs; my $comment; while (<>) { if ( /^# STANDARD (.*)/ ) { $comment = $1; } elsif ( /^function (\S+)$/ ) { $funcs{$1} = $comment; } } # now layout a nice HTML page that tabulates # the keys and values of %funcs print "<HTML> blah blah blah <table><tr><th>Name</th><th>Desc</th></tr +>\n"; print "<tr><td> $_ </td><td> $funcs{$_} </td></tr>\n" for ( sort keys +%funcs ); print "</table> blah blah blah </HTML>\n";
Re: Newbie question, advice appreciated
by cbrandtbuffalo (Deacon) on Sep 27, 2005 at 20:18 UTC
    Here's a simple example that just writes directly to the output file. The regex could be better, and I'm sure someone will comment and give an alternate.
    #!/usr/bin/perl use strict; use warnings; open (FH, '<', 'test.txt') or die "Cannot open 'test.txt' for reading: $!\n"; open (OUT, '>', 'out.html') or die "Cannot open out.html for output: $!\n"; while (<FH>) { if ( /^# STANDARD/ ){ # Grab the current line. my $first_line = $_; # Grab the line right after it. my $second_line = <FH>; print OUT '<p>' . $first_line; print OUT '<p>' . $second_line; } } close FH; close OUT;
Re: Newbie question, advice appreciated
by InfiniteSilence (Curate) on Sep 27, 2005 at 20:29 UTC
    The other nodes answering this question are much better but I was just having so much I couldn't resist:

    The data file:

    foo bar bza #STANDARD Run from bombs! This describes the function bombs! sub bombs { print "We already caught Saddam...what now?"; } #STANDARD Help! My HUMVEE is Broken! This describes the function Broke +nHumvee! sub BrokenHumvee { print "I hit a landmine!"; } 1;

    The code:

    C:\Temp>perl -e "BEGIN{open(H,q|watchdog.dat|) or die $!; @watchdog = +<H>; close (H);} my $newFile; foreach my $watchdog (@watchdog){if($watchdog=~m/(# +STANDARD.* ?function\s+(\S+)!)/){ $newFile .= qq|\=item| . ++$item . qq| $2\n$1\n +|}}; END{ print qq|=head1 Functions\n\nThis is my list of functions\n\n$newFile\ +n\n=cut\n\ n|}" > sha.pod

    Finally:

    C:\Temp>pod2html sha.pod > see.htm

    Celebrate Intellectual Diversity

A reply falls below the community's threshold of quality. You may see it by logging in.