in reply to Module hierarchy help

Creating a separate module for each source of your data seems overkill ... surely you could identify the common tasks and iterate across your sources, configuring a Robot::Web object differently for each source?

In my experience, tying your code that tightly to variables outside your control is a maintenance nightmare ... better (I think) to write a general solution and configure it as necessary.


No good deed goes unpunished. -- (attributed to) Oscar Wilde

Replies are listed 'Best First'.
Re^2: Module hierarchy help
by 2ge (Scribe) on Jul 27, 2006 at 13:51 UTC
    thanks - yes, I know this should be overkill. But I really dont know how to specify in cfg so different task as is parsing some web, or downloading - every web has other structure, so I can't use some "config" for that. I have to code for that own subroutine.

    Putting 50 download_XXX() and parse_XXX() in one file....hm I really dont know if it is good idea. Any other help ? :)

      I guess without knowing a bit more about the kind of data you seek to retrieve and how you intend to parse it, I can't give you very good advice.

      But it seems to me as though you are wanting to repeatedly execute a small handful of fairly basic tasks:

      1. Connect to a website
      2. Find some information
      3. Extract that information
      4. Format the extracted information
      5. Save to a database

      It seems to me that all of those methods could be part of your Robot::Base. Admittedly, you might have to subclass in some cases, but it would seem to me that the website URL, the regular expression(s) or other identifying tag names for locating the proper information, the formatting rules and the SQL statement handles could all be part of your configuration, so that you are executing the same code for each website with different configuration parameters.

      Heck, the best way might be to try it out for a half-dozen of the sites, and see if you can abstract the common code and come up with a solution that is general enough -- if not, then you may end up writing distinct code for each of the 50 data sources. If you end up doing it that way, though, I would hate to be the poor guy who comes along after you and has to maintain it. :)

      Fifty subclasses is beyond my limit of how granular I am willing to get ... what do you do if someone asks you to go to 100 websites, or 1000? When you find yourself writing so much code for essentially the same tasks, you have to ask yourself, "Isn't there some general solution that would save me from so much duplicate effort?". I usually ask myself that question if I have more than five or six subclasses at a particular level.


      No good deed goes unpunished. -- (attributed to) Oscar Wilde

        Thanks for long reply,

        yes, ideal situation would be have those informations in one config. But I really can't imagine such a config, I will give you explanation: we have 50 different websites - I will compare 2 of them.

        On 1st website is something like catalog, I have to click on category links, later click to sub-category, there should be paging (or not), clicking to detail, there should be more pages with details, parse all info I need, and save it to DB.

        On 2nd website I have to simulate queries, so it is totally different as first, I get some list, clicking on that list, get other list, clicking and so on...

        So creating general config like this and making all those routines in Baseclass seems nearly same to me. Problem is, there is really no same navigation nowhere and data is always stored other as on any page. I'd like to see how such a config should look like, never seen that. I know this would be ideal, but...Could you show me such a config, please ?