Tricky has asked for the wisdom of the Perl Monks concerning the following question:
Having put together the regexps to wack HTML tags/style attributes in my test page, I'm looking into implementing a config file, rather than hard-wiring my code.
I've never used configuration files before, and only recently come to understand their precise meaning. Sad huh ;)?! Here's a specification of my code (I'll include the script and the HTML below, for your perusal):
I know that the HTML parser modules are better, I'm just exploring this approach for my MSc. Any ideas on how I could use a config file here?
The Perl code...
#!/usr/bin/perl # subsread2.plx package HTMLMods; =head1 DESCRIPTION Alternative to subread.plx - no control flows, just a 'master' sub wh +ich calls each sub to perform the HTML tag/attribute stripping/alteration, then returning the results. This application groups ALL the regexps into a single unit: 1. The HTML file is opened and inserted into an array (try this with +a scalar too!) 2. A master subroutine is called, which calls other subs to perform H +TML reformatting tasks 3. Each HTML reformatting sub completes its respective operations on +the HTML file 4. Reformatted array is printed in DOS window. 5. OR write changes back to HTML source file. =head2 ALTERNATIVE FILE OPENING CODE my $path = "E:/Documents and Settings/Richard Lamb/My Documents/HTML +"; open (INFILE, "$path/test1InLineCSS.html") or die ("$!: Can't open t +his file"); =head3 BACKREFS TO REMEDY ENTITY VALUE CHANGE PROBLEM? =cut use warnings; use diagnostics; use strict; # Declare and initialise variables. my @htmlFile; # Open HTML test file and read into array. open (INFILE, "E:/Documents and Settings/Richard Lamb/My Documents/HTM +L/test1InLineCSS.html"), or die ("$!: Can't open this file.\n"); @htmlFile = <INFILE>; close (INFILE); sub masterCall { scrapUnderlineTags(); scrapBoldTags(); scrapItalicsTags(); scrapEmphasiseTags(); changeFontStyle(); changeFontSize(); changeFontColour(); changeBackColour(); addTextIndent(); addWordSpacing(); addLetterSpacing(); scrapImageTag(); } masterCall(); # Subroutine defintions # Removes underline tags in array sub scrapUnderlineTags { # iterates through each element (i.e. HTML line) in array foreach my $line (@htmlFile) { $line =~ s/<\/u>//ig; # case insensitivity and global search for p +attern. $line =~ s/<u>//ig; } } # Removes bold tags in array sub scrapBoldTags { foreach my $line (@htmlFile) { $line =~ s/<\/?b>//ig; $line =~ s/<\/?big>//ig; $line =~ s/<\/?strong>//ig; $line =~ s/font-weight:\s?bold;?//ig; } } # Removes italics tags in array sub scrapItalicsTags { foreach my $line (@htmlFile) { $line =~ s/<\/?i>//ig; } } # Remove emphasise tags in array sub scrapEmphasiseTags { foreach my $line (@htmlFile) { $line =~ s/<\/?em>//ig; } } # Change font styles within in-line styles sub changeFontStyle { foreach my $line (@htmlFile) { $line =~ s/font-family:\s?Times;/font-family: Arial;/ig; } } # Change font size within in-line styles sub changeFontSize { foreach my $line (@htmlFile) { $line =~ s/font-size:\s?[0-9]{2}pt;?/font-size: 14pt/ig; } } # Change font colour within in-line styles sub changeFontColour { foreach my $line (@htmlFile) { $line =~ s/[^background-]color:\s?#(?:[0-9a-f]{6}|[0-9a-f]{3});?/" +color: #000000;/ig; } } # Changes background colour attributes in array sub changeBackColour { foreach my $line (@htmlFile) { $line =~ s/background-color:\s?#(?:[0-9a-f]{6}|[0-9a-f]{3});?/back +ground-color: #FFFFFF/ig; } } sub addTextIndent { foreach my $line (@htmlFile) { $line =~ s/(<h[0-6]\sstyle=.*)">/$1; text-indent: 10px">/ig; $line =~ s/(<li\sstyle=.*)">/$1; text-indent: 10px">/ig; $line =~ s/(<p\s style=.*)">/$1; text-indent: 10px">/ig; } } # Inserts word spacing entities within in-line styles sub addWordSpacing { foreach my $line (@htmlFile) { $line =~ s/(<h[1-6]\sstyle=.*)">/$1; word-spacing: 30px">/ig; $line =~ s/(<p\sstyle=.*)">[^<.*?>]/$1; word-spacing: 10px">/ig; $line =~ s/(<li\sstyle=.*)">/$1; word-spacing: 10px">/ig; } } # Inserts letter spacing entities within in-line styles sub addLetterSpacing { foreach my $line (@htmlFile) { $line =~ s/(<h[1-6]\s+style=.*)">/$1; letter-spacing: 3px">/ig; $line =~ s/(<li\sstyle=.*)">/$1; letter-spacing: 2px">/ig; $line =~ s/(<p\sstyle=.*)">/$1; letter-spacing: 2px">/ig; } } # Removes image tag in array sub scrapImageTag { foreach my $line (@htmlFile) { $line =~ s/<IMG\s+(.*)>//ig; } } # Print array to DOS window sub printHTML { for my $i (0..@htmlFile-1) { print $htmlFile[$i]; } } # Replacing original file with reformatted file! open (OUTFILE, ">E:/Documents and Settings/Richard Lamb/My Documents/H +TML/test1InLineCSS.html") or die("$1: Can't rewrite the HTML file.\n" +); print (OUTFILE @htmlFile); close (OUTFILE); # printHTML(); # sub called to print array in DOS window
And the HTML source code...
<!DOCTYPE html PUBLIC "-//W3C//DTD html 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html lang="en"> <head> <title>My First Page</title> <meta http-equiv="content-type" content="text/html; charset=iso-885 +9-1"/> </head> <body style="background-color: #8DCB41"> <h1 style="color: #648DC7; text-align: center; font-family: Time +s; font-weight: bold"> <i>Hello folks...This is my <big>first</big> page!</i> </h1> <h4 style="color: #648DC7; font-family: Times"> <form> <u>Username:</u> <input type="text" name="user"> <br> <u>Password:</u> <input type="password" name="password"> </form> </h4> <p style="font-family: Times; font-size: 10pt">Note that when yo +u type characters in a password field, the browser displays asterisks + or bullets instead of the characters.</p> <img src="images/dicky.jpg" height="350" width="350" title="Dick +ybloke!" style="text-align: center"> <hr> <h4 style="color: #648DC7; font-family: Times">My kinda places.. +.</h4> <ul type="square"> <li style="font-family: Times; font-size: 10pt"> <a href="http://www.bbc.co.uk/manchester/"><i>Manchester</i> +</a></li> <li style="font-family: Times; font-size: 10pt"> <a href="http://www.bbc.co.uk/leeds/"><i>Leeds</i></a></li> <li style="font-family: Times; font-size: 10pt"> <a href="http://www.bbc.co.uk/london/"><i>London</i></a></li +> </ul><hr> <h2 style="color: #2D73B9; text-align: center; font-family: Time +s">Ferocious Felines!</h2> <img src="images/pissedkitty.jpg" height="300" width="300" alt=" +A picture of a "very" upset kitten!"> <img src="images/pissedkitty.jpg" height="300" width="300" alt=" +A picture of a "very" upset kitten!"> <img src="images/pissedkitty.jpg" height="300" width="300" alt=" +A picture of a "very" upset kitten!"> <hr> <h4 style="color: #497FBF; font-family: Times">Places to visit a +nd go back to...</h4> <ul type="square"> <li style="font-family: Times; font-size: 10pt"> <a href="http://english.firenze.net/"><em>Florence</em></a>< +/li> <li style="font-family: Times; font-size: 10pt"> <a href="http://www.timeout.com/prague/"><em>Prague</em></a> +</li> <li style="font-family: Times; font-size: 10pt"> <a href="http://www.canada.com/vancouver/"><em>Vancouver</em +></a></li> <li style="font-family: Times; font-size: 10pt"> <a href="http://metromix.chicagotribune.com/"><em>Chicago</e +m></a></li> <li style="font-family: Times; font-size: 10pt"> <a href="http://www.sanfrancisco.com/"><em>San Fransisco</em +></a></li> <li style="font-family: Times; font-size: 10pt"> <a href="http://www.ny.com/"><em>New York</em></a></li> </ul> <hr> <h2 style="color: #497FBF; text-align: center; font-family: Time +s"><u><b>A Brief History to the Future summary:</b></u></h2> <p style="color: #648DC7; font-family: Times; font-size: 10p +t">The <b><u>Internet</u></b> is the most <strong>remarkable</strong> + achievement of humankind since the pyramids. The millennium from no +w, historians will look back at it and marvelled at the people equipped with + such conduct tools succeeded in creating such a leviathan. Yet even as the Net pervades our lives, we begin to take +it for granted. Many have lost the capacity for wonder. Most of us have no idea where the Interet came from, how +it works, or who created it and why. And even fewer have any idea what it means for society and future.</p> <p style="color: #648DC7; font-family: Times; font-size: 10p +t"><i>John Naughton</i> has written a warm and passionate book whose +heroes and the visionaries laid the foundations of postmodern world. A Brief History of the Future celebrates the engineers an +d scientists who implemented their dreams in hardware and software and explains the values and ideas that drove them. Altho +ugh its subject seems technical, the book in fact is a highly persona +l account. The author writes about the Net and way Nick Ho +rnby writes about soccer-as part of life, and as a key influence on h +is own voyage from solitary child to establish academic and +writer.</p> <p style="color: #497FBF; font-family: Times; font-size: 10p +t"><i><u>A Brief History of the Future</u></i> is an intimate celebra +tion of vision and al truism, ingenuity and determination, and above +all of the power of ideas transform the world.</p> <p style="color: #497FBF; font-family: Times; font-size: 10 +pt">John Naughton is an academic and a journalit. He teaches at the +Open University and has written an award-winning weekly column for the Observer for more than ten years. He lives in Ca +mbridge, England, and is a fellow of Wolfson College at the Universit +y of Cambridge.</p> <br> <h4 style="color: #497FBF; font-family: Times">Link</h4> <ul type="square"> <li style="font-size: 10pt; font-family: Times"> <a href="http://www.briefhistory.com/pages/bh-index.htm">A + Brief History of the Future</a></li> </ul> <hr> <h2 style="color: #497FBF; font-family: Times; text-align: cente +r">Rockerfellers - NOT!</h2> <img src="images/oldgrooves.jpg" height="450" width="450" alt="O +ld-time groovers!" style="text-align: center; border: 3px"> <p style="font-family: Times; font-size: 10pt"> <a href="http://www.bbc.co.uk/">This text<a/> will take you to + the BBC!! </p> <p style="font-family: Times; font-size: 10pt"> <a href="http://www.programmersheaven.com/">This text</a> is a + link to a developer's page.This text is a link to a developer's page +. </p> <p style="font-family: Times; font-size: 10pt"> You can also use an image as a link: <a href="http://www.google.com/"> <img src="images/thankyou.gif" height="75" width="75" alt="Goo +gle search"> </a> </p> <p style="font-family: Times; font-size: 10pt"> <a href="mailto:dikymintos@hotmail.com subject="Hello%20to%20m +e!">Send mail to Mintosville</a> </p> <h4 style="color: #497FBF; font-family: Times">Table headers:</h +4> <table style="margin-left:50px; border: 4px; border-color: #00 +0000"> <tr> <th>Name</th> <th>Telephone</th> <th>Address</th> </tr> <tr> <td>Dicky Mintos</td> <td>0161 2363736</td> <td>Flat 23, Lockes Yard, Manchester</td> </tr> </table> <br><br> <form action="mailto:dickymintos@hotmail.com" method="post" enc +type="text/plain"> <h3 style="color: #6F559D; font-family: Times">This form sends +an e-mail to Trixter...</h3> Name:<br> <input type="text" name="name" value="yourname" size="20"> <br> Mail:<br> <input type="text" name="mail" value="yourmail" size="20"> <br> Comment:<br> <input type="text" name="comment" value="yourcomment" size="4 +0"> <br><br> <input type="submit" value="Send"> <input type="reset" vale="Reset"> </form> </body> </html>
|
|---|