I try to automate a download from a website. the link is http://www.pdb.org/pdb/explore/explore.do?structureId=2CU3

Where "2CU3" can be replaced by anything, in my code the variable for this is $input, and all the structureId's are contained within a text file named 'data.txt'.

After that, I want to get a link from that webpage. The link url is http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=FASTA&compression=NO&structureId=2CU3.

My problem is, when this link is downloaded, it's just junk if I open it in textedit. If I open it using TextWrangler, the content is fine. Any idea what is causing this and how to fix it?

My code is as follows:
#!/usr/bin/perl use strict; use WWW::Mechanize; open (FILE, "data.txt"); my $input; while ($input = <FILE>){ chomp $input; #download PDB html page my $url = "http://www.rcsb.org/pdb/explore.do?structureId="."$input"; my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->get( $url ); #write extracted data to an output file (.html) my $file = "$input".".html"; print "$file"; use Data::Dumper; open (OUTFILE, "> $file"); print OUTFILE Dumper($mech); close(OUTFILE); #download the link (FASTA sequence) my $linkname = "fileFormat=FASTA&compression=NO&structureId="."$input +"; my @links = $mech->find_all_links( url_regex => qr/$linkname/ ); for my $link ( @links ) { my $url = $link->url_abs; my $filename = $url; $filename =~ s[^.+/][]; print "Fetching $url"; $mech->get( $url, ':content_file' => $filename ); print " ", -s $filename, " bytes\n"; } } close (FILE);
--------------------------------------------------------------------

Thanks for the replies. This isn't a TextEdit question, I can't manipulate the data that I get because they're junk. What I mean by junk is that instead of text, I get (*^%&&^(*&^(* sort of stuff. It perplexes me how this data can be seen properly on TextWrangler.

I'll try all your suggestions today. Thanks again!

In reply to using WWW::Mechanize to download a link, opened fine in TextWrangler but as junk in TextEdit by nurulnad

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.