Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I open a html file and find 2 strings in it :
"font-size: 10pt" and "font-family: 'Arial'".
I want to find all words in the file with matches the above 2 strings and replace them with
"font-size: 18pt" and "font-family: 'Helvetica'".
A few sample code of file is given below
body { text-decoration: none; text-indent: 0in; text-align: left; lang: en-US; font-weight: normal; font-variant: normal; color: #000000; font-size: 10pt; font-style: normal; widows: 2; font-family: 'Times New Roman'; }
<p dir="ltr" style="text-align: justify"><span style="font-weight: +bold; lang: en-US; font-size: 11pt; font-family: 'Arial'">Consultancy +</span></p> The code which I wrote for this is :<br> open(html,'test.htm'); while(<html>) { s/[0-9]+pt/18pt/g; }
But this doesn't work surprisingly
Can you point out what's wrong in it
Thankyou !

Replies are listed 'Best First'.
Re: Perl Replace Help
by davido (Cardinal) on Dec 04, 2003 at 08:34 UTC
    Well, for one thing, I don't see where you're writing the changes out to a file. You're reading the file in (implicitly) to $_, and then you're performing the substitution, and then you're moving on to the next line without ever writing the current line out to a new file.

    The other issues that you're going to encounter include (but are not limited to):

    • Newline characters are valid within HTML tags, but your RE doesn't allow for them.
    • Your RE will replace the text matched by the pattern everywhere, which is to say, even within text that isn't part of an HTML tag. In other words, if 10pt exists as part of the actual text, it will get replaced too, thus, changing the meaning of the text, rather than how it is rendered.
    • Regular Expressions are generally the wrong tool for the job in parsing HTML. There are modules for that on CPAN.

    Some of these issues were discussed in the Chatterbox when you asked there.


    Dave

      • Thats wonderful, but the Original Poster is not daling with html tags
      • It will change everything, granted, but it's still not html.
        • He's not parsing html, merely doing a text replace
        • It's still not html
        Anonymous Monk (who, in this case, I know through some CB conversations happens to be the same person as pilot_vijay, who is also the same person who asked about a dozen questions regarding the translating of pdf files and MS format files to HTML a few days ago) stated:
        I open a html file and find 2 strings in it...

        And went on to provide a code snippet that started like this...

        open html, 'test.htm'; while (<html>) { .....

        Perhaps I was reading too much into those clues, but it looked to me like he's trying to deal with HTML.

        I might as well add another nit to pick with the OP's code: Not checking the success or failure of the open.


        Dave

Re: Perl Replace Help
by allolex (Curate) on Dec 04, 2003 at 09:29 UTC

    It might be a good idea to go ahead and put all those changes in a single external style sheet and then rely on CSS inheritance to have the correct font name and size cascade down. It's also a good idea to have multiple fonts named, in case there is no equivalent on the displaying system.

    # font-family: arial, helvetica, sans-serif; #!/usr/bin/perl use strict; use warnings; while (<>) { s/font-family:\s+'*Arial'*/font-family: Helvetica, sans-serif;/g; s/font-size:\s+'*10pt'*/font-size: 18pt;/g; print; }

    --
    Allolex

Re: Perl Replace Help
by l3nz (Friar) on Dec 04, 2003 at 09:01 UTC
    This is quick and dirty, but it should do. At least it does in the test case. Whether it's hand-edited or machine-generated, I think it's safe to assume that we have no CR within a given formatting statement and each and every reference to 'Arial' will be in the context of font formatting (you are not usin this on a tipographer's site, right?)

    This is the code:

    use strict; while (<DATA>) { s/font-size:\s*10pt/font-size: 18pt/i; s/Times New Roman/Helvetica/i; print $_; } __DATA__ body { text-decoration: none; text-indent: 0in; text-align: left; lang: en-US; font-weight: normal; font-variant: normal; color: #000000; font-size: 10pt; font-style: normal; widows: 2; font-family: 'Times New Roman'; } <p dir="ltr" style="text-align: justify"><span style="font-weight: +bold; lang: en-US; font-size: 11pt; font-family: 'Arial'">Consultancy +</span></p>
    This prints out:
    body { text-decoration: none; text-indent: 0in; text-align: left; lang: en-US; font-weight: normal; font-variant: normal; color: #000000; font-size: 18pt; font-style: normal; widows: 2; font-family: 'Helvetica'; } <p dir="ltr" style="text-align: justify"><span style="font-weight: +bold; lang : en-US; font-size: 11pt; font-family: 'Arial'">Consultancy</span></p>