in reply to Re^2: regexp text parsing issue.
in thread regexp text parsing issue.

You're trying to parse something, that you're generating from a CGI? So you have control of what's being generated in the first place... then why are you using HTML (which is difficult to parse)? Generate an alternate output, that can be more easily parsed (or directly used by whatever it is that you're trying to do.)

This is exactly what SOAP, WDDX, XML, and all those other acronyms are for. (although, they do have some overhead, but you're sure to get your data across cleanly) Here's another simple way to pass data out of your CGI:

use Data::Dumper; print "Content-type: text/plain\n\n",Dumper($my_data);

CGIs don't have to generate HTML. XML can be your friend. So can plain text, when used right. (tab delim, CSV, etc)

Replies are listed 'Best First'.
Re^4: regexp text parsing issue.
by Anonymous Monk on Mar 19, 2005 at 02:57 UTC
    Can't do that. The application I am developing for is already mature and a rewrite is not possible. Isn't there anyone who has an idea on how to actually do this instead of suggestion a work around?

    I can't change the format, I can't have third party modules needing to be installed by the user. That is what I am dealing with.

      To be honest, if this were a problem I was facing in my job, I would be looking at the workarounds. I'm serious. There's a reason why so many people here are suggesting workarounds: the straight-forward HTML parsing is so painful that we wouldn't want to rewrite it. Even if you have to grab HTML::Parser and bundle it with your app, that's going to be so much easier and more reliable than regexp's... it'll be worth it.

        Ok you have sold me on HTML::Parser but how would I accomplish this with that module. I have read the docs on it and it seems unable to perform what I am trying to accomplish.

      You're being blinded by your goal. You need to be thinking about how to bypass your problems, not choosing an end goal, and focusing only on working towards it -- if you do, you'll miss the other paths that might come up along the way.

      Try this as a possible work around:

      1. Set up a CGI at a new URI.
      2. The new CGI proxies the connection to the old CGI, but reformats the output, to something that is easily parseable.

      So, you've solved the issue with the users not needing to install HTML::Parser, and you've not re-written the original CGI.