http://qs1969.pair.com?node_id=287071

Tricky has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks,

I have some simple code to read-in an HTML file to an array, remove the image and anchor tags, and write these changes to the source file on my hard-drive. So far, so good.

1. Is there a better way to initialise the variable containing the pattern? Should it be a string literal of the tag i want to remove / change? The code's below, for your perusal.

Once I've read the file in , I'd like to check for the presence of the tags, and if true then call the subs which remove the tags/attributes. One of the brothers, a little while ago, thought that if I tested the patterns as they are at the moment, they would return 'true' as the value of the pattern variables were non-empty strings . Am I doing this right?

2. How may i go about testing for the presence of tags/attributes, without falling down this pit-fall?

I'm also looking into how to alter the font size values of in-line styles via the same approach.

Surely, there are better solutions...

Trix

#!/usr/bin/perl # write mods to HTML file.plx # Program will read in an html file, remove the img tag and rewrite HT +ML on E-drive. # 1. No need for file variable yet: open (INFILE, "<".$htmlFile) or di +e("Can't read source file!\n"); # 2. Alternative: m/<A\s+HREF=[^>]+>(.*?)<\/A>/ - Will not remove clo +sing tag though - why? # 3. Why is interpreter flipping-out over an 'undefined variable', whe +n # original regexp, m/<A\s+HREF=[^>]+>(.*?)<\/A>/, is known to work. + What am I missing? use warnings; use diagnostics; use strict; # Declare and initialise variables. my $pattern1 = '<IMG\s+(.*)>'; my $pattern2 = '<A\s+HREF\s*=[^>]+>'; my $pattern3 = '</A>'; my @htmlLines; # Open HTML test file and read into array. open INFILE, "E:/Documents and Settings/Richard Lamb/My Documents/HTML +/dummy1.html" or die "Sod! Can't open this file.\n"; @htmlLines = <INFILE>; # Call tag-scrapping subs scrapImageTag(); scrapAnchorTag(); # Removes image tag elements in array sub scrapImageTag { # interates through each element (i.e. HTML line) in array foreach my $line (@htmlLines) { # replace <IMG ...> with nothing. $line =~ s/$pattern1//ig; # case insensitivity and global search +for pattern } } # Removes anchor tag elements in array sub scrapAnchorTag { # interates through each element (i.e. HTML line) in array foreach my $line (@htmlLines) { # replace <A HREF ...> with nothing. $line =~ s/$pattern2//ig; # case insensitivity and global search +for pattern $line =~ s/$pattern3//ig; # case insensitivity and global search +for pattern } } # Replacing original file with reformatted file! open (OUTFILE, ">E:/Documents and Settings/Richard Lamb/My Documents/H +TML/dummy1.html") or die("Can't rewrite the HTML file.\n"); print (OUTFILE @htmlLines); close (INFILE); close (OUTFILE);
Cheers,

T

update (broquaint): shifted <code> tags, added formatting and <readmore> tag