Hello Perlmonks, I am having trouble automating form submission on an NCBI primer blast website (http://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi?LINK_LOC=BlastHome). I am trying to use the WWW::Mechanize module to submit information to the form, and then retrieve the information which is produced. The problem I am having is that after you submit your form, there is a waiting page that you sit at while your job is processing. I can't seem to figure out how to make WWW::Mechanize wait through this page and then scrape the content from the final page. With the following code I am only getting the information from the waiting page:
#! /usr/bin/perl -w # NCBI_Primer_Blast - automate primer validation v. 0.001 use strict; use WWW::Mechanize; use HTTP::Cookies; my $cj = HTTP::Cookies->new(); my $mech = WWW::Mechanize->new( timeout => '300', cookie_jar => $cj ); $mech->agent( 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) +Gecko/20100101 Firefox/4.0.1' ); my $url = "http://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi?LI +NK_LOC=BlastHome"; my $accession = "NM_199168.3"; my $left_primer = "GTGGTCGTGCTGGTCCTC"; my $right_primer = "AGATGCTTGACGTTGGCTCT"; my $product_min = "70"; my $product_max = "150"; my $max_number_to_return = "5"; my $min_tm = "57"; my $opt_tm = "60"; my $max_tm = "63"; my $max_tm_diff = "3"; my $span_intron = "on"; my $min_intron = "1000"; my $max_intron = "100000000"; my $search_specific = "on"; my $organism = "Homo sapiens"; my $primer_specificity = "refseq_mrna"; my $min_primer_size = "18"; my $max_primer_size = "22"; my $opt_primer_size = "20"; my $primer_min_gc = "20"; my $primer_max_gc = "80"; my $gc_clamp = "0"; my $max_poly_x = "5"; my $max_self_comp = "8"; my $max_3prime_comp = "3"; my $monovalent_cations = "50"; my $divalent_cations = "0"; my $dNTPs = "0"; my $salt_formula = "0"; my $tm_method = "0"; my $annealing_oligo_conc = "50"; my $result = $mech->get($url); die "Opening NCBI Primer Blast Failed....are you connected to the inte +rnet??\n" unless $result->is_success; print "Opened NCBI PrimerBLAST website...\n"; $mech->submit_form( form_name => 'searchForm', fields => { INPUT_SEQUENCE => $accession, PRIMER_LEFT_INPUT => $left_primer, PRIMER_RIGHT_INPUT => $right_primer, PRIMER_PRODUCT_MIN => $product_min, PRIMER_PRODUCT_MAX => $product_max, PRIMER_NUM_RETURN => $max_number_to_return, PRIMER_MIN_TM => $min_tm, PRIMER_OPT_TM => $opt_tm, PRIMER_MAX_TM => $max_tm, PRIMER_MAX_DIFF_TM => $max_tm_diff, SPAN_INTRON => $span_intron, MIN_INTRON_SIZE => $min_intron, MAX_INTRON_SIZE => $max_intron, SEARCH_SPECIFIC_PRIMER => $search_specific, ORGANISM => $organism, PRIMER_SPECIFICITY_DATABASE => $primer_specificity, PRIMER_MIN_SIZE => $min_primer_size, PRIMER_OPT_SIZE => $opt_primer_size, PRIMER_MAX_SIZE => $max_primer_size, PRIMER_MIN_GC => $primer_min_gc, PRIMER_MAX_GC => $primer_max_gc, GC_CLAMP => $gc_clamp, POLYX => $max_poly_x, SELF_ANY => $max_self_comp, SELF_END => $max_3prime_comp, MONO_CATIONS => $monovalent_cations, DIVA_CATIONS => $divalent_cations, CON_DNTPS => $dNTPs, SALT_FORMULAR => $salt_formula, TM_METHOD => $tm_method } ); die "Submission failed!!\n" unless $mech->success; $mech->follow_link(tag=>'meta'); print $mech->content(format => 'text');
I have also taken the advice I have seen on many of there other nodes on here to watch what my browser is requesting from the website using Firebug. My problem is that I really don't understand these requests enough to determine how I should modify my code to make it send the right requests. Attached is the request from Firebug on the waiting page:
POST /tools/primer-blast/primertool.cgi HTTP/1.1 Host: www.ncbi.nlm.nih.gov User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gec +ko/20100101 Firefox/4.0.1 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0. +8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 115 Connection: keep-alive Referer: http://www.ncbi.nlm.nih.gov/tools/primer-blast/ Cookie: ncbi_sid=8A1A2C9BE1302E51_0026SID
And the request sent once the waiting page has finished because my job is done:
GET /tools/primer-blast/primertool.cgi?ctg_time=1310519679&job_key=JSI +D_01_116688_130.14.24.201_9000 HTTP/1.1 Host: www.ncbi.nlm.nih.gov User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gec +ko/20100101 Firefox/4.0.1 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0. +8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 115 Connection: keep-alive Cookie: ncbi_sid=8A1A2C9BE1302E51_0026SID
I am pretty stuck here, and I come to you for any help you can offer. Thanks! Jason
In reply to WWW::Mechanize Problem by dobson187
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |