visnu has asked for the wisdom of the Perl Monks concerning the following question:

ok, i want to make this:
$_ = <font face="arial, helvetica, verdana">
into this:
$_ = <font face=arial,helvetica,verdana>
i currently have this code:
if (/^<font\b/i) { my ($fonts) = /face="([^"]+)"/i; if ($fonts) { $fonts =~ s/\s+//g; s/face="[^"]+"/face=$fonts/; } }
assuming that the html will be just as simple as the example, is there a more succinct way to do this?

Replies are listed 'Best First'.
RE: making html smaller
by johannz (Hermit) on Jul 26, 2000 at 01:04 UTC
    Technically, the HTML spec says that attributes for HTML tags should be name-value pairs with the value quoted. Reducing the size of your generated HTML is admirable, but how you're trying to do it will not gain you any signifigant savings.
    StringLength
    <font face="arial, helvetica, verdana">40
    <font face=arial, helvetica, verdana>35
    Savings12.5%
    When you consider there are still the end tag and whatever was inside the font tag, your savings goes down even further.
    In summary, I would spend most of my effort on a clean design and layout of the webpages, and let the tags fall as they may.
      well, for a bigger picture, this is for a post processing step that goes through and reduces the html as much as possible, which then gets posted on the live website.. this is not used in the maintainable version of the html. that and 12.5% is significant when you think about all the font tags that are needed in tables across a huge website.
Re: making html smaller (HTML::Clean)
by ybiC (Prior) on Jul 26, 2000 at 01:03 UTC
    A quick search of CPAN turned up the HTML::Clean module, which might do what you want.

    Or it could be more work than it's worth, if your own regex already fits the bill.
        cheers,
        ybiC

Re: making html smaller
by Crulx (Monk) on Jul 26, 2000 at 02:12 UTC
    As for the real value of savings, if you have a lot of font tags the best way to deal with it is to put the fonts types you want as classes in a CSS. Then you can just <p class="stdfont" > to set the font for a particular block.

    If you have a lot of font tags, that would clean up things far better than using the depricated font tag.

    A final note on the 12% savings quoted above, You would only save that much if your entire webpage was made of font tags. There is no savings in the text. So as a whole, you save very little by "cleaning up" the font tags. And you loose your HTML compliance to boot. It is a bad idea and definately the Wrong Thing to do.
    ---
    Crulx
    crulx@iaxs.net

      from w3.org's html 4.01 recommendation:

      By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote marks can be included within the attribute value when the value is delimited by double quote marks, and vice versa. Authors may also use numeric character references to represent double quotes (&#34;) and single quotes (&#39;). For double quotes authors can also use the character entity reference &quot;. In certain cases, authors may specify the value of an attribute without any quotation marks. The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII decimal 95), and colons (ASCII decimal 58). We recommend using quotation marks even when it is possible to eliminate them.
      so according to that, attribute values don't have to be surrounded in quotes. but also, commas aren't allowed in the attribute value either.
        That is not how I read that....
        By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39).
        Seems to indicate to me pretty clearly that quotes (either single or double) are required.
        In certain cases, authors may specify the value of an attribute without any quotation marks. The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII decimal 95), and colons (ASCII decimal 58). We recommend using quotation marks even when it is possible to eliminate them.
        Now this says that you can omit quotation marks (but it still recomends that you keep them in), but only in certain situations situations. Note that the comma is not one of the characters that is allowed in a non-quoted string.
Re: making html smaller
by Maclir (Curate) on Jul 26, 2000 at 03:34 UTC

    Others have already said this, but, as a professional web site developer (and an aspiring perl hacker), DON'T do things like removing quotes. Many people code sloppy HTML (ok, many HTML authoring tools - particularly the WYSIWYG ones - generate incorrect HTML syntax), and some browsers are tolerant enough to get by.

    But, as XML becomes more prevalent, people will need to ensure that the markup complies with the DTD. That means as well, including end tags, and not letting user agents assume where missing markup should be.

    If you are concerned about the size of the page, use CSS to reduce repetitive markup. Now, what about all the other cruft your pages may have? Any animated gifs? Bloated background graphics? Are there lots of embedded tables to overly complicate the layout? These are places to save considerable file size.

Re: making html smaller
by fundflow (Chaplain) on Jul 26, 2000 at 00:49 UTC
    It seems like you want something more general,
    e.g.
    s/, /,/g; s/ +/ /g; (one or more spaces --> one space)
    etc.



    (corrected after comment below)
      s/  / /g;
      Would be better written as
      s/ +/ /g; (or, to treat all whitespace the same: s/\s+/ /g;)
      since eight spaces would be converted to four using the first example, but the second would convert it to one space. The OP also isn't likely to want to s/, /,/g; since that would change text regardless of whether it's in an html tag or outside.
Re: making html smaller
by merlyn (Sage) on Jul 26, 2000 at 02:31 UTC
    ok, i want to make this:
    $_ = <font face="arial, helvetica, verdana">
    into this:
    $_ = <font face=arial,helvetica,verdana>
    You want to make your HTML illegal? I don't understand. You need to quote arguments that aren't alphanumerics. Please don't break your HTML.

    -- Randal L. Schwartz, Perl hacker

Re: making html smaller
by turnstep (Parson) on Jul 26, 2000 at 02:23 UTC

    You might want to look into using CSS if your main concern is the space taken by a lot of FONT tags and other things. Some good links can be found at: