URI::SearchTerms - Collect search terms from the search URLs of common search engines

See the POD in the code below for more information. My questions are:

Any questions, comments, suggestions, et cetera are extremely welcome.

package URI::SearchTerms; use warnings; use strict; =head1 NAME URI::SearchTerms - Collect search terms from the search URLs of common search engines =head1 SYNOPSIS use URI::SearchTerms; my $search_url = "http://www.google.com/search?q=foo+bar+baz"; my @terms = URI::SearchTerms->terms($search_url); print join ":", @terms, "\n"; =head1 DESCRIPTION An early version of this was written with the intention to use it to parse webserver log files (specifically the referer) to discover what search terms users were using to find the site. The idea later transformed into this. Besides parsing referers in log files, this could be used dynamically in CGI or mod_perl scripts to detect if users are coming from search engine results, and if so, what search terms they used. Currently the module supports Google, Yahoo, MSN, and AOL search URLs. If you would like to suggest another search engine to support, please email me (C<tsibley@cpan.org>) with either a few example URLs or, less preferrably, a place to get my own. Patches are even better. : ) =head1 METHODS =head2 URI::SearchTerms::terms($url), URI::SearchTerms->terms($url) This takes one argument: the URL to parse. It returns an array of the search terms, which will in most cases only contain one element. C<terms()> may be called in the class-style or the fully qualified style. =cut # Try to require CGI::Simple first. If that fails, try to # load CGI.pm. If all that fails, die with an error. I # don't use URI::QueryParam because it doesn't handle all # cases as it should. my $CGI = 'CGI::Simple'; eval { require CGI::Simple; }; eval { require CGI; $CGI = 'CGI'; } if $@; if ($@) { die "The CGI::Simple or CGI modules must be installed " . "for URI::SearchTerms to work!"; } require URI; my %pats = ( google => { pat => qr<google\.>, keys => ['q','as_q'], }, yahoo => { pat => qr<yahoo\.>, keys => ['p'], }, msn => { pat => qr<msn\.>, keys => ['q'], }, aol => { pat => qr<aol\.>, keys => ['query'], }, ); sub terms { my $url = $_[1] ? $_[1] : $_[0]; my @terms; my $uri = URI->new($url); my $host = $uri->host; my $query = $uri->query; for (keys %pats) { if ($host =~ /$pats{$_}->{pat}/) { my $q = $CGI->new($query); for (@{$pats{$_}->{keys}}) { push @terms, $q->param($_); } } } return @terms; } =head1 REQUIREMENTS Currently, this module uses L<URI.pm> and L<CGI::Simple|CGI::Simple> (or if that isn't available L<CGI.pm>) to parse the query strings from the URLs and extract the appropriate params. =head1 BUGS This module desperately needs more test cases. There are probably a bunch of valid URLs for Yahoo or MSN or AOL that don't work (although I think I've covered Google pretty well). If you find one, please email me the URL at C<tsibley@cpan.org>. =head1 LICENSE This module is free software, and may be distributed under the same terms as Perl itself. =head1 AUTHOR Copyright (C) 2003, Thomas R. Sibley C<tsibley@cpan.org> =cut 1;

Updated the POD as per Corion's suggestion and changed the code to use URI.pm for query string extraction.


In reply to RFC: URI::SearchTerms by The Mad Hatter

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.