file reading issues

mojobozo has asked for the wisdom of the Perl Monks concerning the following question:

First, please forgive my absence. It's been almost 2 years since my last confession, er... post. As a result of this and not playing with perl in the mean time, I've forgotten a bit.

The question: I have this cgi script:

#!/usr/bin/perl

print "Content-type: text/html\n\n";


&print_return_page_top;


if (open (FILENAME, "..//..//test//index.html"))
    {
    $line = <FILENAME>;
    while ($line ne "")
        {
        print "$line";
        $line = <FILENAME>;
        }
    }
else {
        print "Booger<br>";
}

&print_return_page_bottom;


##################################
sub print_return_page_top
{
print <<RETURN_PAGE_TOP;
<HTML>
<HEAD>
<TITLE>Update Index</TITLE>
</HEAD>
<BODY>
    Type your changes in the box below and then press Submit:
    <BR>
    <FORM NAME="update-index" ACTION="update-index.cgi" METHOD="get">
      <TEXTAREA ROWS="20" COLS="100" NAME="index-data">

RETURN_PAGE_TOP
}
###################################


##################################
sub print_return_page_bottom
{
print <<RETURN_PAGE_BOTTOM;
      
      </TEXTAREA>
      <BR>
      <INPUT TYPE="Submit" NAME="Submit" VALUE="Submit">
    </FORM>
  </BODY>
</HTML>

RETURN_PAGE_BOTTOM
}
###################################
[download]

Works just as I want it, meaning it takes the html file I'm looking at and dumps it into the textarea. However, I want to grab just a portion of the file I'm reading from. I have  and  comments in the file and want to grab the stuff between them. I tried playing around a bit with the above script and these comment lines, but all I got was a blank textarea.

Can someone help me out?

Thanks!

_____________________________________________________
mojobozo
word (wūrd)
interj. Slang. Used to express approval or an affirmative response to
something. Sometimes used with up. Source

Comment on file reading issues Select or Download Code

Replies are listed 'Best First'.
Re: file reading issues by jdporter (Paladin) on Aug 03, 2005 at 18:47 UTC
First of all, I would re-write `$line = <FILENAME>; while ($line ne "") { print "$line"; $line = <FILENAME>; }` [download] as `while (<FILENAME>) { print ; }` [download] if the expected behavior is to print out every line of the file. One way to tweak the above to get your desired output is as follows: `while (<FILENAME>) { last if /<!-- Begin -->/; } while (<FILENAME>) { last if /<!-- End -->/; print ; }` [download]	[reply] [d/l] [select]
Re^2: file reading issues by bofh_of_oz (Hermit) on Aug 03, 2005 at 19:35 UTC
I would think that if there's text after <!-- BEGIN --> but on the same line, it would not be printed, as well as the text before <!-- End --> -------------------------------- An idea is not responsible for the people who believe in it...	[reply]
Re^2: file reading issues by mojobozo (Monk) on Aug 03, 2005 at 19:10 UTC
OK, this one worked! Thanks, jdporter! _____________________________________________________ mojobozo word (wūrd) interj. Slang. Used to express approval or an affirmative response to something. Sometimes used with up. Source	[reply]
Re: file reading issues by sgifford (Prior) on Aug 03, 2005 at 20:45 UTC
The `..` (dot-dot) operator was designed for this: `while (<>) { if (/<!-- Begin -->/ .. /<!-- End -->/) { print; } }` [download] Or, more concisely: `(/<!-- Begin -->/ .. /<!-- End -->/) && print while (<>);` [download] See Range Operators in perlref(1) for more information.	[reply] [d/l] [select]
Re: file reading issues by dtr (Scribe) on Aug 03, 2005 at 20:08 UTC
You will get all manner of horrible things happen to your current code if the page that you're editing happens to contain the string "</textarea>" in it anywhere. You should escape the HTML that you are printing inside the text box to get around this. At a minimum, replacing all instances of "<" with "<" should do the trick. There are modules such as HTML::Sanitizer which try to do this in a more sophisticated way.	[reply]
Re: file reading issues by kwaping (Priest) on Aug 03, 2005 at 18:45 UTC
There are a couple ways I can think of to do that, offhand. If it's a small file and you can read it all into memory, you can do something like this: `my $file = '/path/to/file.txt'; open(IN,"<$file") \|\| die $!; read IN, my $html, -s $file; close(IN); $html =~ s/<!-- Begin -->(.*?)<!-- End -->/$1/s;` [download] Or, if the file is large and you'd like to read it line by line, you might want to try setting a flag when Begin is encountered, then turning it off again when End is hit. `my $flag = 0; while (<FILE>) { $flag = 1 if (/<!-- Begin -->/); $flag = 0 if (/<!-- End -->/); process_line($_) if ($flag); }` [download]	[reply] [d/l] [select]
Re^2: file reading issues by JediWizard (Deacon) on Aug 03, 2005 at 19:19 UTC
I notice you have a different method for reading an entire file into memory then I am used to seeing. I was wondering if there is any reason you are aware of that makes your way better or worse than the way I use (or if they are just different (TIMTOWTDI)). If there is, I'd love to hear about it. `read IN, my $html, -s $file; # your code my $file = do{ local $/; <IN> }; # The way I usually use.` [download] Other monks... can anybody tell me if there is an advantage in one way or the other? Update The secode line of cone above is the way I am used to seeing... I forgot to complete the comment. My question regards the difference between using `$/` as opposed to using `read` to slurp in an entire file. They say that time changes things, but you actually have to change them yourself. —Andy Warhol	[reply] [d/l] [select]
Re^3: file reading issues by kwaping (Priest) on Aug 03, 2005 at 19:36 UTC
What is the way you're used to seeing? Maybe I am the one in need of enlightenment and your way is superior. To answer your question, there is no reason why I do it that way except that's the way I learned how to do it. Maybe there was a good reason to do it like that back then which has now been made moot by advances in Perl - I don't know.	[reply]
Re^4: file reading issues by JediWizard (Deacon) on Aug 04, 2005 at 02:57 UTC
Re^5: file reading issues by kwaping (Priest) on Aug 04, 2005 at 15:08 UTC
Re^5: file reading issues by JediWizard (Deacon) on Aug 04, 2005 at 16:49 UTC
Re^3: file reading issues by anonymized user 468275 (Curate) on Aug 04, 2005 at 16:13 UTC
IMO, the difference will be what kind of buffered I/O gets performed. Fuller explanation: Re: Speed reading (files) One world, one people	[reply]
Re: file reading issues by bofh_of_oz (Hermit) on Aug 03, 2005 at 19:13 UTC
A regex will parse the file just fine. Just grab the whole file into a variable, then do this: `#Sample data, multiline $line = "<!-- Begin -->Line one\nLine two\nthree\nfour<!-- End -->"; #process them $line =~ s/<!-- Begin -->(.)<!-- End -->/$1/s; print $line;` [download] I tried to do the same while reading the file line-by-line... The code was so ugly that I simply recommend to read in the whole file at once and do a multiline regexp above... HTH -------------------------------- An idea is not responsible for the people who believe in it...*	[reply] [d/l]
Re: file reading issues by mojobozo (Monk) on Aug 03, 2005 at 19:17 UTC
Follow up question: How can I strip off the leading blank spaces on each line? Keep in mind that I'm using jdporter's snipit of code for grabbing between the comments. I like to indent my html for readability (mine) but don't need all those spaces in the form. _____________________________________________________ mojobozo word (wūrd) interj. Slang. Used to express approval or an affirmative response to something. Sometimes used with up. Source	[reply]
Re^2: file reading issues by kwaping (Priest) on Aug 03, 2005 at 19:39 UTC
Using [id://jdporter]'s code: `while (<FILENAME>) { last if /<!-- Begin -->/; } while (<FILENAME>) { last if /<!-- End -->/; s/^\s*//; #<- new line here print ; }` [download]	[reply] [d/l]
Re: file reading issues by wfsp (Abbot) on Aug 04, 2005 at 10:21 UTC
I have comments in the file and want to grab the stuff between them. I would consider using HTML::TokeParser. I use the following. #!/bin/perl5 use strict; use warnings; use HTML::TokeParser; my $file = 'index.html'; my $tp = HTML::TokeParser->new($file) or die "Couldn't parse $file: $!"; my ($start, $html); while (my $tag = $tp->get_token) { if ( $tag->[0] eq 'C' and $tag->[1] eq '<!-- article start -->' ) { $start++; next; } next unless $start; if ( $tag->[0] eq 'C' and $tag->[1] eq '<!-- article end -->' ) { last; } $html .= $tag->[4] if $tag->[0] eq 'S'; $html .= $tag->[1] if $tag->[0] eq 'T' or $tag->[0] eq 'C'; $html .= $tag->[2] if $tag->[0] eq 'E'; } print "$html\n"; # ["S", $tag, $attr, $attrseq, $text] # ["E", $tag, $text] # ["T", $text, $is_data] # ["C", $text] # ["D", $text] # ["PI", $token0, $text] [download]	[reply] [d/l]