Image Size in an HTML file

Stamp_Guy has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I'm looking into making a program that would open a batch of HTML files, find all the image tags, calculate the size of the images (in pixels) and insert the correct width="" and height="" values. I am using Image::Size for getting the actual sizes, but here's where I'm stuck: How do I get the filename from the HTML? I could probably "roll-my-own", but I've been told that I should probably ask first before I do something like that. Has anyone else here done something of this sort? Can you guys give me any suggestions/idea? They would be greatly appreciated. Thanks!

-Stamp_Guy
If winners never quit and quitter never win, who's the fool who came up with "quit while you're ahead"?

Comment on Image Size in an HTML file

Replies are listed 'Best First'.
(jeffa) Re: Image Size in an HTML file by jeffa (Bishop) on Jun 02, 2001 at 19:32 UTC
I think the best way to parse the HTML files is with HTML::Parser. This snippet will extract the src attributes any img tags that are found: `use strict; use IO::File; use HTML::Parser; # version 3.15, by the way # get the contents of the HTML file my $fh = new IO::File('google.html'); my $html = do {local $/; <$fh>}; my $parser = HTML::Parser->new(api_version => 3); $parser->handler(start => \&start, 'self,tagname,attr'); $parser->parse($html); sub start { my ($parser,$tag,$attr) = @_; return unless $tag eq 'img'; # insert code to process the image file print $attr->{src}, "\n"; }` [download] From here you can add code to open the image file, and for the fun part, insert the new value in . . . Jeff R-R-R--R-R-R--R-R-R--R-R-R--R-R-R-- L-L--L-L--L-L--L-L--L-L--L-L--L-L--	[reply] [d/l]
Re: (jeffa) Re: Image Size in an HTML file by Stamp_Guy (Monk) on Jun 02, 2001 at 20:25 UTC
Hey Jeffa, Thanks for the code snippet. I tried running it though and I got this error: "Can't locate object method "handler" via package "HTML::Parser" at parser.pl line 14 <GEN0> chunk 1". Any idea what's wrong? -Stamp_Guy	[reply]
Re: Image Size in an HTML file by merlyn (Sage) on Jun 02, 2001 at 20:40 UTC
Already done as a column of mine. Who woulda thunk?! {grin} -- Randal L. Schwartz, Perl hacker	[reply]
Re: Re: Image Size in an HTML file by ChemBoy (Priest) on Jun 03, 2001 at 06:02 UTC
You might also want to look at HTML::LinkExtor, which I don't think existed when merlyn wrote that column, but which is designed specifically around this kind of problem. Update: merlyn is, of course, right--LinkExtor won't give you the context for reinserting the tags (bad ChemBoy! No coffee!). However, the reason I pointed it out is that HTML::Filter is deprecated--if you're going to write your own, similar program, HTML::TokeParser or HTML::PullParser is a more appropriate solution. If God had meant us to fly, he would never have give us the railroads. --Michael Flanders	[reply]
Re: Re: Re: Image Size in an HTML file by merlyn (Sage) on Jun 03, 2001 at 20:06 UTC
Well, HTML::LinkExtor is fine if you just want the links, but in a transformation like this, you also need all the non-link text as well. Unless you were just replacing the entire file with only a bunch of images. {grin} -- Randal L. Schwartz, Perl hacker	[reply]
Re: Re: Image Size in an HTML file by stuffy (Monk) on Jun 03, 2001 at 03:57 UTC
I'm beginning to think that Vroom should place a "search merlyn's columns" box next to the "search cpan" box. that way we can find the answer to our questions easier and quicker. :~) Stuffy That's my story, and I'm sticking to it, unless I'm wrong in which case I will probably change it ;~)	[reply]