Hi. I'm student studying programming. I have experience in programming C and Perl. I know most of syntax and concept of both language. However, I don't have any experience building my own project. and I have only basic knowledge of Networks.

So I start my first project using WWW::Mechanize. My goal is to get list of titles and url of bulletin board for given period. To start with, I tried to get html from this site. http://hiphople.com/kboard (It a Korean)

#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->get( "http://hiphople.com/kboard" ); print $mech->content();

But the output is ���w�Ʊ0��tN��ӆR#�... WWW::Mechanize use utf-8 as default, and target site's html header said it use utf-8 too. So it's not encoding problem. and I found the fact that target site use gzip.(I found it at the http response header).

To solve this, first I tried to use WWW::Mechanize::Gzip. but the document said "If the webserver does not support gzip-compression, no decompression will be made." and I guess http://hiphople.com/kboard web server does not support gzip-compression. because It doesn't working.

So I tried to decompress it without getting help from webserver. the code below is my attempt to do that.

#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; use IO::Uncompress::Gunzip qw(gunzip); my $mech = WWW::Mechanize->new( autocheck => 1 ); my $responce = $mech->get( "http://hiphople.com/kboard" ); my $output = "file1.txt"; gunzip $responce => $output;

But the result is .IO::Uncompress::Gunzip::gunzip: illegal input parameter. I guess it's because $response is not .gz format due to mechanize. and that all I can guess. I don"t know what to do.

So this is what I have encountered during getting simple html file from site that I want. Now I need some help from other people. Getting a html file is the first step of my project and it was hard to achieve. Can anyone help me?

P.S My English is not that great. So I'm afraid it was difficult for you to read. Sorry for that.


In reply to Problem while using WWW::Mechanize module for getting html by yujong_lee

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.