http://qs1969.pair.com?node_id=9403


in reply to Extract and modify IMG SRC tags in an HTML document.

Sorry, it ate my submission, and I'm still new here...
Here's how I would do it:

1.  Read in the whole HTML file into a variable:

open FILE,"filename";
read FILE,$file,100000;
close FILE;

(I've seen few HTML docs that are over 100000 bytes in size)

2.  Split the $file by "<IMG":

@lines = split(/\<IMG/,$file);

3.  Shift out the first line of @lines (it doesn't have an <IMG> tag in it, so we don't need it) and begin to create the new HTML file

$newfile = shift @lines;

4.  For each line in @lines:
    Split the line at the first ">"
    Replace the "SRC=" element with the new "SRC=" element, assuming that the new graphic is based on the old graphic's URL

foreach $line (@lines) {
  $pos = index($line,'>');
  $tag = substr($line,0,$pos+1);
  $restofline = substr($line,$pos+1);
  $tag =~ s/SRC\=\"(.*?)\"/SRC\=\"$newurls{$1}\"/gi;
  $newfile .= $tag . $restofline . "\n";
}

5.  Do whatever with the $newfile:

print $newfile;

Complete code:

open FILE,"filename";
read FILE,$file,100000;
close FILE;

@lines = split(/\<IMG/,$file);
$newfile = shift @lines;

foreach $line (@lines) {
  $pos = index($line,'>');
  $tag = substr($line,0,$pos+1);
  $restofline = substr($line,$pos+1);
  $tag =~ s/SRC\=\"(.*?)\"/SRC\=\"$newurls{$1