RE: making html smaller
by johannz (Hermit) on Jul 26, 2000 at 01:04 UTC
|
Technically, the HTML spec says that attributes for HTML tags should be name-value pairs with the value quoted. Reducing the size of your generated HTML is admirable, but how you're trying to do it will not gain you any signifigant savings.
String | Length |
<font face="arial, helvetica, verdana"> | 40 |
<font face=arial, helvetica, verdana> | 35 |
Savings | 12.5% |
When you consider there are still the end tag and whatever was inside the font tag, your savings goes down even further.
In summary, I would spend most of my effort on a clean design and layout of the webpages, and let the tags fall as they may. | [reply] |
|
well, for a bigger picture, this is for a post processing step that goes through and reduces the html as much as possible, which then gets posted on the live website.. this is not used in the maintainable version of the html. that and 12.5% is significant when you think about all the font tags that are needed in tables across a huge website.
| [reply] |
Re: making html smaller (HTML::Clean)
by ybiC (Prior) on Jul 26, 2000 at 01:03 UTC
|
A quick search of CPAN turned up the HTML::Clean module, which might do what you want.
Or it could be more work than it's worth, if your own regex already fits the bill.
cheers,
ybiC | [reply] |
Re: making html smaller
by Crulx (Monk) on Jul 26, 2000 at 02:12 UTC
|
As for the real value of savings, if you have a lot of font
tags the best way to deal with it is to put the fonts types
you want as classes in a CSS. Then you can just
<p class="stdfont" > to set the font for a particular
block.
If you have a lot of font tags, that would clean up things
far better than using the depricated font tag.
A final note on the 12% savings quoted above, You would
only save that much if your entire webpage was made of
font tags. There is no savings in the text. So as a whole,
you save very little by "cleaning up" the font tags. And
you loose your HTML compliance to boot. It is a bad idea and
definately the
Wrong Thing to do.
---
Crulx
crulx@iaxs.net | [reply] |
|
By default, SGML requires that all attribute values be
delimited using either double quotation marks (ASCII
decimal 34) or single quotation marks (ASCII decimal 39).
Single quote marks can be included within the attribute
value when the value is delimited by double quote marks,
and vice versa. Authors may also use numeric character
references to represent double quotes (") and single
quotes ('). For double quotes authors can also use the
character entity reference ".
In certain cases, authors may specify the value of an
attribute without any quotation marks. The attribute value
may only contain letters (a-z and A-Z), digits (0-9),
hyphens (ASCII decimal 45), periods (ASCII decimal 46),
underscores (ASCII decimal 95), and colons (ASCII decimal
58). We recommend using quotation marks even when it is
possible to eliminate them.
so according to that, attribute values don't have to be surrounded in quotes. but also, commas aren't allowed in the attribute value either. | [reply] [d/l] |
|
That is not how I read that....
By default, SGML requires that all attribute values be
delimited using either double quotation marks (ASCII
decimal 34) or single quotation marks (ASCII decimal 39).
Seems to indicate to me pretty clearly that quotes (either single
or double) are required.
In certain cases, authors may specify the value of an
attribute without any quotation marks. The attribute value
may only contain letters (a-z and A-Z), digits (0-9),
hyphens (ASCII decimal 45), periods (ASCII decimal 46),
underscores (ASCII decimal 95), and colons (ASCII decimal
58). We recommend using quotation marks even when it is
possible to eliminate them.
Now this says that you can omit quotation marks (but it still
recomends that you keep them in), but only in certain situations
situations. Note that the comma is not one
of the characters that is allowed in a non-quoted
string.
| [reply] |
Re: making html smaller
by Maclir (Curate) on Jul 26, 2000 at 03:34 UTC
|
Others have already said this, but, as a professional web site developer (and an aspiring perl hacker), DON'T do things like removing quotes. Many people code sloppy HTML (ok, many HTML authoring tools - particularly the WYSIWYG ones - generate incorrect HTML syntax), and some browsers are tolerant enough to get by.
But, as XML becomes more prevalent, people will need to ensure that the markup complies with the DTD. That means as well, including end tags, and not letting user agents assume where missing markup should be.
If you are concerned about the size of the page, use CSS to reduce repetitive markup. Now, what about all the other cruft your pages may have? Any animated gifs? Bloated background graphics? Are there lots of embedded tables to overly complicate the layout? These are places to save considerable file size.
| [reply] |
Re: making html smaller
by fundflow (Chaplain) on Jul 26, 2000 at 00:49 UTC
|
It seems like you want something more general,
e.g.
s/, /,/g;
s/ +/ /g; (one or more spaces --> one space)
etc.
(corrected after comment below)
| [reply] [d/l] |
|
s/ / /g;
Would be better written as
s/ +/ /g; (or, to treat all whitespace the same: s/\s+/ /g;)
since eight spaces would be converted to four using the first example, but the second would convert it to one space. The OP also isn't likely to want to s/, /,/g; since that would change text regardless of whether it's in an html tag or outside.
| [reply] [d/l] [select] |
Re: making html smaller
by merlyn (Sage) on Jul 26, 2000 at 02:31 UTC
|
ok, i want to make this:
$_ = <font face="arial, helvetica, verdana">
into this:
$_ = <font face=arial,helvetica,verdana>
You want to make your HTML illegal? I don't understand.
You need to quote arguments that aren't alphanumerics. Please don't
break your HTML.
-- Randal L. Schwartz, Perl hacker
| [reply] [d/l] [select] |
Re: making html smaller
by turnstep (Parson) on Jul 26, 2000 at 02:23 UTC
|
You might want to look into using CSS if your main concern is the space taken by a lot of FONT tags and other things.
Some good links can be found at:
| [reply] |