Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I've wrote a subroutine that takes a word or phrase and looks it up in an online dictionary ... dictionary.com ... using lwp::useragent -> no problems, works great ...

I want to do the same thing at a different online dictionary but I keep getting 501 error - method not implemented ... switching methods from get to post doesn't help ... is there a way to work around this ... ?

Replies are listed 'Best First'.
Re: method not supported
by davido (Cardinal) on Dec 28, 2005 at 03:54 UTC

    Before we try to get clever and diagnose the problem, we probably ought to see a minimal script that generates the error. It should be possible to boil this type of problem down to ten lines or less. If the problem goes away, you'll be half-way toward isolating the issue on your own. If it doesn't go away, you'll have a great working example for us to look at for you.


    Dave

      what URL are you using for the "other" dictionary with lwp::useragent?
      the hardest line to type correctly is: stty erase ^H

      Here's the code ...


      my $agent = new LWP::UserAgent; my $define = new HTTP::Request; $define->method('get'); $define->url($url); my $website = $agent->request($define); my $file = $website->content;

      from there $file gets fed into a parser ... but to test dictionaries I just print $file."\n";

      www.dictionary.com works fine ... $url="http://dictionary.reference.com/search?q=".$word;

      gives the html that can be parsed ...

      but www.naver.com doesn't ... $url="http://dic.naver.com/search.naver?mode=all&query=".$word;

      where $word="happy"; gives me this html ...


      <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML><HEAD> <TITLE>501 Method Not Implemented</TITLE> </HEAD><BODY> <H1>Method Not Implemented</H1> get to /search.naver not supported.<P> Invalid method in request get /search.naver?mode=all&amp;query=happy H +TTP/1.1<P> </BODY></HTML>

      I tried 'post' and read thru the cpan docs for the module but can't grab webpages from the site ... which is annoying since the xml tags at naver make parsing alot easier ...

        Well, I don't see the problem, and testing and tinkering with it I couldn't get your code to work. I did try assigning an agent name to make it look like the request was coming from Firefox instead of LWP::UserAgent, just in case the server is blocking robots.

        I did re-write the code, to more closely match the synopsis given in the docs for LWP::UserAgent, and that seemed to do the trick:

        use strict; use LWP::UserAgent; my $word = "happy"; my $url = "http://dic.naver.com/search.naver?mode=all&query=" . $word; my $agent = new LWP::UserAgent; $agent->agent('Firefox/1.5'); my $response = $agent->get( $url ); if( $response->is_success ) { print $response->content(); } else { die $response->status_line; }

        See if you can adapt that to your needs, because it seemed to work fine for me.


        Dave

Re: method not supported
by marto (Cardinal) on Dec 28, 2005 at 09:23 UTC
    Hi Anonymous Monk,

    Did you check out the terms of use for dictionary.com?
    The section 'Rights in Site Content and the Site' reads to me like they dont want people parsing and republishing their content.

    Update:Further to rinceWind's post, I am not assuming that the OP wants to republish or modify the content, I posted this in case the OP had not read the Ts&Cs of the site they were using.

    Martin

      What gives you the impression that the OP wants to republish the content?

      There's plenty of details of the copyright terms on the page you have quoted. Provided you keep it verbatim and unmodified, with any attributions, and provided you only keep it for your own personal use, and are not profiting from it, there is nothing against HTML-scraping this site.

      marto, please stop spreading FUD about copyright. It's bad enough with the likes of Micro$oft and the media barons trying to pull the wool over everyone's eyes.

      --

      Oh Lord, won’t you burn me a Knoppix CD ?
      My friends all rate Windows, I must disagree.
      Your powers of persuasion will set them all free,
      So oh Lord, won’t you burn me a Knoppix CD ?
      (Missquoting Janis Joplin)

        What gives you the impression that marto has a certain impression about what the OP intends to do with the content?

        Before we all engage in group "mind reading" (under "Jumping to conclusions"), let's realize that we just don't know what the OP is doing with the content unless he tells us, but that the fact he's using 'bot-like code could mean he's automating many downloads, which might be a violation of the site's Terms of Use.

        Similarly, we just don't know what marto thinks the OP is doing unless marto tells us... But just as we might want to give the OP the benefit of the doubt and assume he's not doing anything unethical, let's give marto the same courtesy and not assume he's "spreading FUD".

        Just as we have a general expectation that robots will obey PerlMonk's wishes with regard to spidering the site, dictionary.com probably has a reasonable expectation that their Terms of Use will be honored. marto was simply pointing out that before one turns a robot loose on a site, one should find out if, in fact, such a thing is permissible, or, might land one in hot legal water. In doing so, marto showed he has both the OP's and dictionary.com's best interests at heart.

        IMHO, it sounds to me like your real problem is with "the likes of Micro$oft and the media barons"... Perhaps educating people about the fallacies behind the arguments of same could provide a more productive outlet for your (quite understandable) frustrations.

        planetscape
        A reply falls below the community's threshold of quality. You may see it by logging in.