dobson187 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perlmonks, I am having trouble automating form submission on an NCBI primer blast website (http://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi?LINK_LOC=BlastHome). I am trying to use the WWW::Mechanize module to submit information to the form, and then retrieve the information which is produced. The problem I am having is that after you submit your form, there is a waiting page that you sit at while your job is processing. I can't seem to figure out how to make WWW::Mechanize wait through this page and then scrape the content from the final page. With the following code I am only getting the information from the waiting page:

#! /usr/bin/perl -w # NCBI_Primer_Blast - automate primer validation v. 0.001 use strict; use WWW::Mechanize; use HTTP::Cookies; my $cj = HTTP::Cookies->new(); my $mech = WWW::Mechanize->new( timeout => '300', cookie_jar => $cj ); $mech->agent( 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) +Gecko/20100101 Firefox/4.0.1' ); my $url = "http://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi?LI +NK_LOC=BlastHome"; my $accession = "NM_199168.3"; my $left_primer = "GTGGTCGTGCTGGTCCTC"; my $right_primer = "AGATGCTTGACGTTGGCTCT"; my $product_min = "70"; my $product_max = "150"; my $max_number_to_return = "5"; my $min_tm = "57"; my $opt_tm = "60"; my $max_tm = "63"; my $max_tm_diff = "3"; my $span_intron = "on"; my $min_intron = "1000"; my $max_intron = "100000000"; my $search_specific = "on"; my $organism = "Homo sapiens"; my $primer_specificity = "refseq_mrna"; my $min_primer_size = "18"; my $max_primer_size = "22"; my $opt_primer_size = "20"; my $primer_min_gc = "20"; my $primer_max_gc = "80"; my $gc_clamp = "0"; my $max_poly_x = "5"; my $max_self_comp = "8"; my $max_3prime_comp = "3"; my $monovalent_cations = "50"; my $divalent_cations = "0"; my $dNTPs = "0"; my $salt_formula = "0"; my $tm_method = "0"; my $annealing_oligo_conc = "50"; my $result = $mech->get($url); die "Opening NCBI Primer Blast Failed....are you connected to the inte +rnet??\n" unless $result->is_success; print "Opened NCBI PrimerBLAST website...\n"; $mech->submit_form( form_name => 'searchForm', fields => { INPUT_SEQUENCE => $accession, PRIMER_LEFT_INPUT => $left_primer, PRIMER_RIGHT_INPUT => $right_primer, PRIMER_PRODUCT_MIN => $product_min, PRIMER_PRODUCT_MAX => $product_max, PRIMER_NUM_RETURN => $max_number_to_return, PRIMER_MIN_TM => $min_tm, PRIMER_OPT_TM => $opt_tm, PRIMER_MAX_TM => $max_tm, PRIMER_MAX_DIFF_TM => $max_tm_diff, SPAN_INTRON => $span_intron, MIN_INTRON_SIZE => $min_intron, MAX_INTRON_SIZE => $max_intron, SEARCH_SPECIFIC_PRIMER => $search_specific, ORGANISM => $organism, PRIMER_SPECIFICITY_DATABASE => $primer_specificity, PRIMER_MIN_SIZE => $min_primer_size, PRIMER_OPT_SIZE => $opt_primer_size, PRIMER_MAX_SIZE => $max_primer_size, PRIMER_MIN_GC => $primer_min_gc, PRIMER_MAX_GC => $primer_max_gc, GC_CLAMP => $gc_clamp, POLYX => $max_poly_x, SELF_ANY => $max_self_comp, SELF_END => $max_3prime_comp, MONO_CATIONS => $monovalent_cations, DIVA_CATIONS => $divalent_cations, CON_DNTPS => $dNTPs, SALT_FORMULAR => $salt_formula, TM_METHOD => $tm_method } ); die "Submission failed!!\n" unless $mech->success; $mech->follow_link(tag=>'meta'); print $mech->content(format => 'text');

I have also taken the advice I have seen on many of there other nodes on here to watch what my browser is requesting from the website using Firebug. My problem is that I really don't understand these requests enough to determine how I should modify my code to make it send the right requests. Attached is the request from Firebug on the waiting page:

POST /tools/primer-blast/primertool.cgi HTTP/1.1 Host: www.ncbi.nlm.nih.gov User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gec +ko/20100101 Firefox/4.0.1 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0. +8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 115 Connection: keep-alive Referer: http://www.ncbi.nlm.nih.gov/tools/primer-blast/ Cookie: ncbi_sid=8A1A2C9BE1302E51_0026SID

And the request sent once the waiting page has finished because my job is done:

GET /tools/primer-blast/primertool.cgi?ctg_time=1310519679&job_key=JSI +D_01_116688_130.14.24.201_9000 HTTP/1.1 Host: www.ncbi.nlm.nih.gov User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gec +ko/20100101 Firefox/4.0.1 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0. +8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 115 Connection: keep-alive Cookie: ncbi_sid=8A1A2C9BE1302E51_0026SID

I am pretty stuck here, and I come to you for any help you can offer. Thanks! Jason

Replies are listed 'Best First'.
Re: WWW::Mechanize Problem
by Anonymous Monk on Jul 13, 2011 at 16:47 UTC

    wait through this page and then scrape the content from the final page

    Loop

    do { $mech->follow_link(tag=>'meta'); } until ThisIsThePageIWant( $mech ); print $mech->content(format => 'text');

     

    My problem is that I really don't understand these requests enough to determine how I should modify my code to make it send the right requests

    Read Ovid's CGI Course

    $ perl -MWWW::Mechanize -le " print WWW::Mechanize-> new->get( shift ) +->request ->as_string" http://example.com GET http://www.iana.org/domains/example/ Accept-Encoding: gzip User-Agent: WWW-Mechanize/1.68
    Do you see the connection? GET is GET
    GET /tools/primer-blast/primertool.cgi?ctg_time=1310519679&job_key=JSI +D_01_116688_130.14.24.201_9000 HTTP/1.1
    The key issue is extracting the whole URL or just job_key and ctg_time from content (meta or wherever)

    SEE ALSO
    TAIR::Blast - A module to gather automated BLAST result from TAIR (http://www.arabidopsis.org/Blast/index.jsp)
    Bio::Tools::Run::RemoteBlast - Object for remote execution of the NCBI Blast via HTTP
    site:bioperl.org LWP blast

      Thanks Anon. Looks like I have a lot of reading to do. I will see if this works. Also, I do see that there are a lot of utilities to run BLAST, but not PrimerBLAST, which is a bit different than regular BLAST.

        Solved it with WWW::Mechanize::Plugin::FollowMetaRedirect !