package URI::SearchTerms; use warnings; use strict; =head1 NAME URI::SearchTerms - Collect search terms from the search URLs of common search engines =head1 SYNOPSIS use URI::SearchTerms; my $search_url = "http://www.google.com/search?q=foo+bar+baz"; my @terms = URI::SearchTerms->terms($search_url); print join ":", @terms, "\n"; =head1 DESCRIPTION An early version of this was written with the intention to use it to parse webserver log files (specifically the referer) to discover what search terms users were using to find the site. The idea later transformed into this. Besides parsing referers in log files, this could be used dynamically in CGI or mod_perl scripts to detect if users are coming from search engine results, and if so, what search terms they used. Currently the module supports Google, Yahoo, MSN, and AOL search URLs. If you would like to suggest another search engine to support, please email me (C) with either a few example URLs or, less preferrably, a place to get my own. Patches are even better. : ) =head1 METHODS =head2 URI::SearchTerms::terms($url), URI::SearchTerms->terms($url) This takes one argument: the URL to parse. It returns an array of the search terms, which will in most cases only contain one element. C may be called in the class-style or the fully qualified style. =cut # Try to require CGI::Simple first. If that fails, try to # load CGI.pm. If all that fails, die with an error. I # don't use URI::QueryParam because it doesn't handle all # cases as it should. my $CGI = 'CGI::Simple'; eval { require CGI::Simple; }; eval { require CGI; $CGI = 'CGI'; } if $@; if ($@) { die "The CGI::Simple or CGI modules must be installed " . "for URI::SearchTerms to work!"; } require URI; my %pats = ( google => { pat => qr, keys => ['q','as_q'], }, yahoo => { pat => qr, keys => ['p'], }, msn => { pat => qr, keys => ['q'], }, aol => { pat => qr, keys => ['query'], }, ); sub terms { my $url = $_[1] ? $_[1] : $_[0]; my @terms; my $uri = URI->new($url); my $host = $uri->host; my $query = $uri->query; for (keys %pats) { if ($host =~ /$pats{$_}->{pat}/) { my $q = $CGI->new($query); for (@{$pats{$_}->{keys}}) { push @terms, $q->param($_); } } } return @terms; } =head1 REQUIREMENTS Currently, this module uses L and L (or if that isn't available L) to parse the query strings from the URLs and extract the appropriate params. =head1 BUGS This module desperately needs more test cases. There are probably a bunch of valid URLs for Yahoo or MSN or AOL that don't work (although I think I've covered Google pretty well). If you find one, please email me the URL at C. =head1 LICENSE This module is free software, and may be distributed under the same terms as Perl itself. =head1 AUTHOR Copyright (C) 2003, Thomas R. Sibley C =cut 1;