Tomcat7194 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm a researcher trying to get data out of an ArcIMS map.

The Baltimore Police Department posts geo-coded crime data to a web tool which uses ArcIMS--it's AJAX based, so it seems like it must be pulling XML data from somewhere. I'm thinking that there has to be a way to scrape this data, but I'm not too up to speed on Javascript, so I'm having trouble figuring out which JS file is the significant one.

The tool is located at: http://maps.baltimorepolice.org/bpdmaps/police_querytype.asp?cmd=neighborhood&PD=EASTERN

Anyone have experience scraping from ARCIms?

Thanks! Tom

Replies are listed 'Best First'.
Re: Scraping from ArcIMS Map
by roboticus (Chancellor) on Apr 05, 2009 at 18:24 UTC
    Tomcat7194:

    You might not need to analyze the JaveScript. Try just capturing the AJAX requests to the server and look them over. Often the method to talk to the server is "relatively obvious" after you've looked over a few requests and compared them.

    ...roboticus
      Ok, I used Firebug to record that requests going to the server, and what comes back.

      Here is the stuff from FireBug:
      Headers

      Post

      Response

      So I wrote this code to try and do the same kind of request:
      #!/usr/bin/perl use LWP::UserAgent; my $ua = new LWP::UserAgent; my $string = <<END; <?xml version="1.0" encoding="UTF-8" ?><ARCXML version="1.1"> <REQUEST> <GET_FEATURES outputmode="xml" envelope="true" geometry="false" featur +elimit="1000" beginrecord="1"> <LAYER id="6" /><SPATIALQUERY subfields="CCNO CRIME_DESC FROM_DATE PRE +M_DESC LOCATION POST DISTRICT" where="(FROM_DATE &gt;&#061; {ts '2009-01-01 00:00:00'}) AND (FROM_DA +TE &lt;&#061; {ts '2009-01-06 23 :59:59'})"> "><SPATIALFILTER relation="area_intersection" ><POLYGON> <RING> <POINT x="1427074.01374657" y="596810.020728368" /> <POINT x="1429830.95443664" y="596810.020728368" /> <POINT x="1429830.95443664" y="592945.463874963" /> <POINT x="1427074.01374657" y="592945.463874963" /> </RING> </POLYGON> </SPATIALFILTER></SPATIALQUERY></GET_FEATURES></REQUEST></ARCXML> END my $server = <<SERVER; http://maps.baltimorepolice.org/servlet/com.esri.esrimap.Esrimap?Servi +ceName=BaltimorePolice&ClientVersion=3.1&Form=True&Encode=False SERVER my $response = $ua->post("$server", { ArcXMLRequest => "$string", FormCharset => 'ISO-8859-1', BgColor => '#000000', JavaScriptFunction +=> 'parent.MapFrame.processXML', FooterFile => '', HeaderFile => '', RedirectURL => '' }); if ($response->is_success) { print $response->content; # or whatever } else { die $response->status_line; }


      However, instead of getting the response like in Firebug (which is exactly what I want), I'm getting this:
      <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1 +"><HTML><HEAD><TITLE>Default Form</TITLE><!-- Title must match jsForm +.htm's title --><SCRIPT TYPE="text/javascript" LANGUAGE="JavaScript"> +function passXML() { var XMLResponse='<?xml version="1.0" encoding="UTF-8"?><ARCXML versio +n="1.1"><RESPONSE><ERROR machine="BPDMAP1" processid="2008" threadid= +"2748">Not a correct ArcXML request.</ERROR></RESPONSE></ARCXML>'; parent.MapFrame.processXML(XMLResponse); }</SCRIPT></HEAD><BODY BGCOLOR="#000000" onload="passXML()"><FORM ACTI +ON="" METHOD="POST" name="theForm"><!--- <input type="Hidden" name="F +orm" value="True"> ---><INPUT TYPE="Hidden" NAME="ArcXMLRequest" VALU +E=""><INPUT TYPE="Hidden" NAME="JavaScriptFunction" VALUE="parent.Map +Frame.processXML"><INPUT TYPE="Hidden" NAME="BgColor" VALUE="#000000" +><INPUT TYPE="Hidden" NAME="FormCharset" VALUE="ISO-8859-1"><INPUT TY +PE="Hidden" NAME="RedirectURL" VALUE=""><INPUT TYPE="Hidden" NAME="He +aderFile" VALUE=""><INPUT TYPE="Hidden" NAME="FooterFile" VALUE=""></ +FORM></BODY></HTML>


      Any ideas?
        Tomcat7194:

        Did you compare the request byte-by-byte to verify that it's identical? I'd try that first. (I'm not familiar with firebug, so I don't know how to view the results as a hex dump. I normally use something like Ethereal to capture the traffic...)

        If the request is *identical* and you're not getting the correct result, there might be some session traffic (cookie or some such?) that needs to be handled. I'm not terribly experienced with HTTP traffic, so some other monk will have to chime in on that, if it's relevant.

        ...roboticus