parshtml has asked for the wisdom of the Perl Monks concerning the following question:

Again I need your wisdom to solve the problem. I am trying to substitute a particular word in HTML file with a HREF word. The problem is when the file 0071.html is created it has the contents of the previous file. What i need is that for the first time when it finds the file 3800-128.html it should parse and then write into the file 0071.html then when it encounter the other file 3800-129.html it should delete the content of 3800-128.html and overwrite it but the problem is that first it print the contents of 3800-128.html and then it print the contents of 3800-129.html and so on. Please help me
#!perl/bin/perl -w print "Content-type: text/html\n\n"; use LWP::Simple; use HTML::TokeParser; $q=128; while ($q < 132 ) { open(FILE, "<D:/MON/3800-$q.html"); while (<FILE>) { $data1 .= $_; } close (FILE); $sid = &sid12($data1); open (MYFILE, '>0071.html'); print MYFILE "$sid<br>"; close (MYFILE); print "$sid<br><br><br>"; sub sid12{ my $htmls = $_[0]; $htmls =~ s/onClick/HREF/isg; return $htmls; } close (FILE); unlink("0071.html"); $q++; }

Replies are listed 'Best First'.
Re: Unknown File Operation
by rpanman (Scribe) on Jul 10, 2007 at 19:36 UTC
    Looking at the code, it opens D:/MON/3800-128.html, reads it into memory, creates a copy replacing 'onClick' with 'HREF' and writes this to screen and to the file 0071.html. It then unlinks 0071.html which deletes the file. Not sure what the point is of deleting a file you've just created but hey ho...

    Your code is a bit confusing and I think that is probably not helping. Can you describe exactly what you are after? My guess is the following:
  • Copy 3800-128.html over 0071.html replacing html onClick with HREF.
  • Replace 3800-128.html with 2800-129.html (no change of html)
  • Replace 3800-129.html with 2800-130.html (no change of html)
  • Replace 3800-130.html with 2800-131.html (no change of html)

  • The following code should do the same as your original but without the memory load of copying the entire input file into memory...
    foreach my $q (128..131){ # open the files # the input file open( FILE, "<D:/MON/3800-$q.html" ); # the output file open( MYFILE, '>0071.html' ); # do the replacement and output the results to MYFILE while (my $line = <FILE>){ $line =~ s/onClick/HREF/isg; print MYFILE $line; print $line; } print MYFILE '<br/>'; print '<br/><br/><br/>'; # close the files close(MYFILE); close(FILE); unlink("0071.html"); }
Re: Unknown File Operation
by roboticus (Chancellor) on Jul 11, 2007 at 10:16 UTC

    parshtml:

    Off the top of my head, it appears to be that you would get what you want if you reduce the scope of your $data variable. The clause:

    while (<FILE>) { $data1 .= $_; }

    is adding to $data on each loop. So you could either:

  • Add the statement $data=''; just before that while loop, or
  • Add the statement my $data; just before that while loop, or
  • Slurp in the file in a single statement, replacing $data instead of appending to it, like:
  • $data = <FILE>;

    ...roboticus