comment on

Everything is possible. Never give up. That said, this isn't that easy: you might well think that the following is just too convoluted to bother with, and that using some external agent (such as Internet Explorer driven by Perl) to do the task might be easier. Anyway, if you're doing it without external aid, there's two distinct steps to the task:

Parse all the CSS that pertains to the page, following <link>ed stylesheets, @import rules etc, as you've described, to find all the rules that refer to a background property
Find which of these rules have a selector which addresses a part of the HTML document to which the CSS is being applied

For step one, as already suggested, you might use the CSS package. This allows you to gather up lots of bits of CSS into a single ruleset, by repeatedly using the read_string method. Then, you can look at the aggregrate ruleset to find which rules have either background-image or background rules (the latter shorthand notation can include background-image specifications.)

use CSS;

my $css = CSS->new();

$css->read_string('div#foo p.bar { background-image : url(/foo/bar.gif
+) }');
# $css->read_string('table { border: 1px solid #FF0000 }');
# etc... 

my %bg_selectors;

# for some reason, CSS doesn't supply accessor methods...
foreach my $rule (  @{ $css->{'styles'} } ) {
    foreach my $prop (  @{ $rule->{'properties'} } ) {
        if ( $prop->{'property'} =~ /^background(?:-image)?$/ ) {
            foreach my $selector ( @{ $rule->{'selectors'} } ) {
                $bg_selectors{$selector->{'name'}} =
                    $prop->{'simple_value'};
            }

        }
    }
}
[download]

You should then have a hash keyed on CSS selectors whose values are the relevant background property value. For step 2, you need to find out whether the HTML document contains elements to which the rule should be applied. One way to do this would be to parse the document into a tree, then use XPath generated from the selectors to test the document.

I'm working with XML::XPath, which means that you'll need your source document to be valid XHTML. If it's not, there's a few ways to get there, such as using the htmltidy utility with the appropriate options to convert the document, or possibly using the experimental XML methods on a parse tree generated by HTML::TreeBuilder.

To test the document for the existence of the elements, you'll need to convert the CSS rules into XPath expressions. Here's a very limited example, which only deals with CSS tag, containment, classes and id selectors. It's also not much tested:


sub selector_to_xpath {
    my $selector = shift;
    my $xpath = '';

    foreach my $token ( split(/\s/, $selector) ) {
        if ( $token =~ /(\w+)? (?: \#(\w+) | \.(\w+) )?/x ) {
            $xpath .= '//';
            my ( $tag, $id, $class ) = ( $1, $2, $3 );
            if ( $tag ) {
                $xpath .= $tag;
            }
            if ( $id ) {
                $xpath .= "*" unless $tag;
                $xpath .= "[\@id='$id']";
            }
            if ( $class ) {
                $xpath .= "*" unless $tag;
                $xpath .= "[\@class='$class']";
            }
        }
    }
    return $xpath;
}
[download]

Now, rolling this all together....

use strict;
use warnings;

use CSS;
use XML::XPath;
use Data::Dumper;

sub selector_to_xpath {
    my $selector = shift;
    my $xpath = '';

    # doesn't deal with much of the CSS spec ... 
    foreach my $token ( split(/\s/, $selector) ) {
        if ( $token =~ /(\w+)? (?: \#(\w+) | \.(\w+) )?/x ) {
            $xpath .= '//';
            my ( $tag, $id, $class ) = ( $1, $2, $3 );
            if ( $tag ) {
                $xpath .= $tag;
            }
            if ( $id ) {
                $xpath .= "*" unless $tag;
                $xpath .= "[\@id='$id']";
            }
            if ( $class ) {
                $xpath .= "*" unless $tag;
                $xpath .= "[\@class='$class']";
            }
        }
    }
    return $xpath;
}

my $css = CSS->new();
# this rule matches an element in our doc
$css->read_string('div#foo p.bar { background-image : url(/foo/bar.gif
+) }');
# this doesn't match an element in our doc
$css->read_string('div#foo p.qux { background-image : url(/foo/qux.gif
+) }');
# nor does this
$css->read_string('div#baz p.bar { background-image : url(/foo/baz.gif
+) }');
# but this does
$css->read_string('div { background-image : url(/foo/div.gif) }');

# gather up all rules talking about backgrounds
my %bg_rules;
foreach my $rule (  @{ $css->{'styles'} } ) {
    foreach my $prop (  @{ $rule->{'properties'} } ) {
        if ( $prop->{'property'} =~ /^background(?:-image)?$/ ) {
            foreach my $selector ( @{ $rule->{'selectors'} } ) {
                $bg_rules{$selector->{'name'}} =
                    $prop->{'simple_value'};
            }

        }
    }
}


# slurp up the XML and parse for XPath-ery
my $xml;
{
    local $/;
    $xml = XML::XPath->new(ioref => *DATA);
}


# go through our list of CSS rules seeing which ones apply
my @used_images;
while ( my ( $sel, $propvalue ) = each %bg_rules ) {
    my $xpath = selector_to_xpath($sel);
    push(@used_images, $propvalue) if $xml->exists($xpath);
}


# let's see what we got ...
warn Dumper \@used_images;


__END__
<html>
<head>
</head>
<body>

<div id="foo">
<div>
<p class="bar">This one</p>
</div>
</div>

<div id="qux">
<p class="bar">Not me</p>
</div>
</body>
</html>
[download]

Obviously, there's still a bit of work to be done to retrieve the image urls from the CSS properties, and also LOTS of work to implement as much of the CSS selector spec as you need, but hopefully it might get you started. Or dissuade you from the whole idea ;)

Cheers
ViceRaid

In reply to Re: Testing Page Size with HTML 4/CSS by ViceRaid
in thread Testing Page Size with HTML 4/CSS by Cody Pendant

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.