in reply to [SOLVED] Scraping from ArcIMS Map

Tomcat7194:

You might not need to analyze the JaveScript. Try just capturing the AJAX requests to the server and look them over. Often the method to talk to the server is "relatively obvious" after you've looked over a few requests and compared them.

...roboticus

Replies are listed 'Best First'.
Re^2: Scraping from ArcIMS Map
by Tomcat7194 (Novice) on Apr 05, 2009 at 21:47 UTC
    Ok, I used Firebug to record that requests going to the server, and what comes back.

    Here is the stuff from FireBug:
    Headers

    Post

    Response

    So I wrote this code to try and do the same kind of request:
    #!/usr/bin/perl use LWP::UserAgent; my $ua = new LWP::UserAgent; my $string = <<END; <?xml version="1.0" encoding="UTF-8" ?><ARCXML version="1.1"> <REQUEST> <GET_FEATURES outputmode="xml" envelope="true" geometry="false" featur +elimit="1000" beginrecord="1"> <LAYER id="6" /><SPATIALQUERY subfields="CCNO CRIME_DESC FROM_DATE PRE +M_DESC LOCATION POST DISTRICT" where="(FROM_DATE &gt;&#061; {ts '2009-01-01 00:00:00'}) AND (FROM_DA +TE &lt;&#061; {ts '2009-01-06 23 :59:59'})"> "><SPATIALFILTER relation="area_intersection" ><POLYGON> <RING> <POINT x="1427074.01374657" y="596810.020728368" /> <POINT x="1429830.95443664" y="596810.020728368" /> <POINT x="1429830.95443664" y="592945.463874963" /> <POINT x="1427074.01374657" y="592945.463874963" /> </RING> </POLYGON> </SPATIALFILTER></SPATIALQUERY></GET_FEATURES></REQUEST></ARCXML> END my $server = <<SERVER; http://maps.baltimorepolice.org/servlet/com.esri.esrimap.Esrimap?Servi +ceName=BaltimorePolice&ClientVersion=3.1&Form=True&Encode=False SERVER my $response = $ua->post("$server", { ArcXMLRequest => "$string", FormCharset => 'ISO-8859-1', BgColor => '#000000', JavaScriptFunction +=> 'parent.MapFrame.processXML', FooterFile => '', HeaderFile => '', RedirectURL => '' }); if ($response->is_success) { print $response->content; # or whatever } else { die $response->status_line; }


    However, instead of getting the response like in Firebug (which is exactly what I want), I'm getting this:
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1 +"><HTML><HEAD><TITLE>Default Form</TITLE><!-- Title must match jsForm +.htm's title --><SCRIPT TYPE="text/javascript" LANGUAGE="JavaScript"> +function passXML() { var XMLResponse='<?xml version="1.0" encoding="UTF-8"?><ARCXML versio +n="1.1"><RESPONSE><ERROR machine="BPDMAP1" processid="2008" threadid= +"2748">Not a correct ArcXML request.</ERROR></RESPONSE></ARCXML>'; parent.MapFrame.processXML(XMLResponse); }</SCRIPT></HEAD><BODY BGCOLOR="#000000" onload="passXML()"><FORM ACTI +ON="" METHOD="POST" name="theForm"><!--- <input type="Hidden" name="F +orm" value="True"> ---><INPUT TYPE="Hidden" NAME="ArcXMLRequest" VALU +E=""><INPUT TYPE="Hidden" NAME="JavaScriptFunction" VALUE="parent.Map +Frame.processXML"><INPUT TYPE="Hidden" NAME="BgColor" VALUE="#000000" +><INPUT TYPE="Hidden" NAME="FormCharset" VALUE="ISO-8859-1"><INPUT TY +PE="Hidden" NAME="RedirectURL" VALUE=""><INPUT TYPE="Hidden" NAME="He +aderFile" VALUE=""><INPUT TYPE="Hidden" NAME="FooterFile" VALUE=""></ +FORM></BODY></HTML>


    Any ideas?
      Tomcat7194:

      Did you compare the request byte-by-byte to verify that it's identical? I'd try that first. (I'm not familiar with firebug, so I don't know how to view the results as a hex dump. I normally use something like Ethereal to capture the traffic...)

      If the request is *identical* and you're not getting the correct result, there might be some session traffic (cookie or some such?) that needs to be handled. I'm not terribly experienced with HTTP traffic, so some other monk will have to chime in on that, if it's relevant.

      ...roboticus
        Turns out that this wasn't a matter of cookies or headers (I tried adding both to the request with no results). The functionality for sending the request was perfectly fine--it just turns out that trying to send the XML formatted as Utf-8 was messing things up.

        When I removed that from the XML, everything went through just fine, and the server spit out the right data.

        Tom