n8g has asked for the wisdom of the Perl Monks concerning the following question:

I have a question about using request headers to communicate with a website directly. I am trying to scrape chase's online banking to download my account balance. I have successfully written a similar script for Bank of America so I am familiar with some of the hurdles associated with secure logins and javascript obfuscated forms but I can't seem to figure this one out. In my BOA attempt I was able to grab forms and set the values using Mechanize. That approach does not seem to work for Chase. I have used the Firefox Tamper Data plugin to take a look at the header information and at this point I think the best approach may be to duplicate the headers exactly and forget about the forms. Assuming that is the case is the correct (or a correct) approach to use the mechanize add_header followed by a post? Alternatively if anyone has already solved this problem I would certainly appreciate seeing the solution.

Replies are listed 'Best First'.
Re: Trying to download account balance
by Gangabass (Vicar) on Sep 16, 2007 at 08:36 UTC

    Here is my approach (using Firefox):

    1. Disable Javascript
    2. Enable recording in LiveHTTPHeaders Extension
    3. Try to login

    If everything OK then there is a way to login from script. Try to exactly repeat HTTP headers (and don't forget about cookies!).

    Also usually you must first get login page (to get cookies)and after that send login information.

Re: Trying to download account balance
by Anonymous Monk on Sep 16, 2007 at 05:00 UTC

      After a bit of consideration I have removed this from my site. It was not my intention to assist anyone in a hacking a bank site and to be honest I am a bit skeptical that this would be a practical approach. After all without a valid username and password you would need to perform some sort of brute force attack which would certainly be detected and dealt with long before you were able to get in.

      Maybe I am missing something. I am not a security expert. If I am able to adequately confirm my belief I will replace the script. In the meantime better safe than sorry.

      Update: After further consideration and a little research I am comfortable that this script is legitimately useful information and have put it back up.
Re: Trying to download account balance
by BrowserUk (Patriarch) on Sep 16, 2007 at 05:21 UTC

    If you are a legitimate customer of Chase Manhattan Bank (most recently, and more properly known as JPMorgan Chase & Co.), then you would know that there is no legitimate reason for wanting to defeat their elaborate and specific procedures to prevent automated access to their user account information.

    More importantly, if you were a legitimate customer of that enterprise, you would not want to bypass their security procedures. Much less have a mechanism for doing so advertised to every would be HaXoR (read:Organised criminal) that might visit this site.

    If you were a legitimate customer, and if you had made even the most cursory of enquiries of that bank as to what facilities were available for the secure, automated access to your account parameters, then you would not be asking such transparently nefarious and obviously stupid questions on an open-to-all forum.

    Of course, knowing this place pretty well. the same people who will deny help, or even pointers, to those they perceive (rightly or wrongly) of looking for help with "homework", will probably expend untold effort to provide assistance for those looking to bypass the security protocols of a banking organisation... so long as it isn't their bank.

    After all, organised criminals would never pretend to be naive enough to actually ask for help in breaking the security protocols of second largest bank in the USA, whilst simultaneously admitting that they had already done so for the first largest bank in the USA.

    Holy [the worst expletive you can think of!] This is a screwed-up world.

    And the most obvious indicator here of just how screwed-up it is, is the reaction that a post--perceived by one or two self-righteously moralistic monks as "homework"--will receive. They will pontificate about the morality of assisting teenagers with their homework, and yet, those same self-righteous monks will happily and repeatedly post code to automate the download of copyright images; defeat Captcha protocols; and any number of other procedures of dubious legality.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I am sorry that my intentions have been so entirely misunderstood. I am in fact a legitimate customer of both banks and don't understand how being so would preclude me wanting to automate the activity of checking my account balance.

      Believe it or not while I did think that this question may raise a few eyebrows I did not believe that automating access to a site and account I already have access to would be construed automatically as nefarious.

      My purpose here is to create a "Financial Portal" for my family to help us manage our finances. Being able to on a single page be able to see all of our account balances without individually logging in to each account seemed to me to a be a legitimate use of these sites. If you are a heavy debit card user and have ever gone below your balance you know how quickly overdraft fees on a few small purchases can add up.

      You are correct about the world being a bit screwed up. We are currently living in a paranoid environment where anything that can be misconstrued as a criminal or terrorist act will be, I certainly can't wait for that to end.

      Having said that I hope that I have not offended the entire community with this request. I had hoped that this would be a good resource for learning Perl. In retrospect this was probably not the greatest of first questions to ask . Maybe I am a bit too naive. In any case I don't think that I will change your mind, you obviously have an axe to grind, but I did want to respond to allow others to make up their own minds.

      BrowserUk, I'm always happy to see your posts here. You clearly have a good grasp of Perl techniques and how to apply them and you're one of the users here that I've learned a lot from, but, IMO, you jumped the gun on this one.

      Now, I am not a Chase customer and I don't use online access to my bank, but I could immediatly see multiple legitimate uses for someone who does access their account information online to want to automate that process. Aside from the OP's stated reason of aggregating multiple accounts into a single "family financial portal" page, two other legitimate uses which immediately come to mind would be to periodically monitor (via cron or some Windows equivalent) your balance(s) and send yourself email if they drop below some threshold or to download a transaction history for import into whatever home financial program one might use (e.g., QuickBooks or the like).

      Granted, a lot of sites (financial and otherwise) don't really like people scraping their sites, so that's probably not the best way to go about it if there are other options, but there aren't always other options1 and, even if there are, scraping tends to be the most obvious way to get information from a site, since it's essentially taking the way the user interacts with a site and duplicating it in code.

      The OP said nothing to indicate that he intended to try to obtain new login credentials for the site, whether by brute-force or other means. I see nothing there to indicate that he wants to do anything other than to make automated use of login credentials which he already uses manually, which rather strongly implies that he already has said credentials. (Even if they're not legitimate, he can already use them and, therefore, already do as much damage as he likes through their (mis)use.) If you see anything specific which gives you the impression of nefarious intent ("specific" as opposed to "it involves financial information, therefore it must be for fraudulent purposes"), please point it out, because I'm not seeing it.

      Like the OP, I agree with you that the world is pretty messed up at present. I also agree with him - and your signature - that one of the major problems is the existing "oppressive environment of political correctness and risk aversion" in which generalized paranoia causes many to heavily question others' motives and jump to negative conclusions on the slightest provocation.

      1 cf. my recent post Net::Google replacement? and the apparent lack of a legitimate alternative to scraping Google at this time

      I am also your undying (some say undead) fan but I love automated access to my secure accounts. Until Google put the kabosh on one of my scripts by doing their login via JS I could check my whole AdSense history with stats, trends, trailing averages, and other spreadsheet-ish stuff from the command line on a whim. Now I have to login, sometimes twice because they won't allow a gmail account to have access in some places and other places they require it so I get cookie tag-teamed by them (really, really bad UI for Google and I can't believe it's still like that a year out), then either click around 4 or 5 times or download a CSV and run it through a tool or an app. Lame.

      Everyone big should provide an API into their stuff. Customers are going to want it eventually and with an API the company gets to control it instead of spawning a dozen hacks to get at it.

      Where in the OPs post did he say he wanted to bypass security procedures? All I'm getting from his question is that he wants to use an alternate interface. That is, instead of using a web browser (over the accepted transport protocol HTTP, one hopes S) he'd like to access it via a perl script (over same said protocol). Since when is that not a legitimate desire? Any security measure that does not rely on the security of information shared only by the bank and the customer (password/TAN), but instead on obscurity of the way this information is verified, is bound to be inherently broken. If this were essential to the "security" of access, exposing this flaw could only be good for the customers, and thereby the bank itself

      Thinking about it, you must either not know very much about web security or have some vested interest in keeping the workings of Chase online banking as obscure as possible. Which is it?

      And that last paragraph pontificating about others pontifications is just ridiculous. Tone it down BUKky, you're losing it.

        Most browser-based banking sites will require Javascript, and will often go through several layers of redirect. One of the goals of these mechanisms is to try and ensure that no significant information gets left lying around in browser caches and/or local proxies.

        Another goal is to try and ensure that credentials are input by a human being. This is an attempt to prevent the possibility of root kits and other nasties from being able to login automatically. The single greatest point of weakness of the entire banking system are customers homes. Most banks do everything they can to prevent automated access to their systems, other than via those mechanisms they put in place.

        For the record. I have *no* associations or relationships with Chase. And I do not use internet banking systems. The only online banking systems I consider secure are those that use dedicated dialup. Paranoid? Make up your own mind.

        As for me "losing it". Is it really such a stretch of your imagination, that the breadth and depth of the skill levels in this place, combined with the freely given nature of that expertise, has not gone unnoticed to those on the web that would put that expertise to less than legitimate usage?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.