Sj03rd has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm trying to crawl webpages that consists of nothing but simple text. However, when saved to the hard drive, the page has become unreadable and consists of nothing but scrambled text (such as í½isÛH²úùøW 4ísä etc). Does anybody know what goes wrong? Thanks!

use WWW::Mechanize; my $mech = WWW::Mechanize->new( autocheck => 0 ); @arraydatad = split(/\//, $arraydata[4]); $filename = test; $filecrawl = "http://www.sec.gov/Archives/edgar/data/1000045/000114420 +4-06-005708.txt"; $mech->get($filecrawl, ':content_file' => $filename);

Replies are listed 'Best First'.
Re: Scrambled text in downloaded webpages
by Corion (Patriarch) on Aug 01, 2012 at 10:53 UTC

    Most likely, the response Transfer-Encoding says "compressed". You will need to look at ->decoded_content instead of directly storing the output to a file.

      Mechanize doesn't have decoded_content, only content which DWYM.

      Thank you Corion! One way to do what you suggest is described here: http://stackoverflow.com/questions/1285305/how-can-i-accept-gzip-compressed-content-using-lwpuseragent/1285328#1285328 This solved the problem for me.

Re: Scrambled text in downloaded webpages
by daxim (Curate) on Aug 01, 2012 at 11:36 UTC
    You are using the wrong method. The method get inherited from LWP saves the unchanged resource body.

    You want save_content that dumps the content after decoding.

      Thank you Daxim, you're right: see me reply in the previous thread.