2ge has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I am quite new to modules, I did some, but nothing big, I am not good in OO. Ok, now I have new task - it is about fetching, parsing and inserting to db some information from internet. Lets say we have 50 of sources, and I'd like to have all routines for 1 source in 1 module. So modules should look like this:

Robot::Base - reading config file into hash (?); - connecting to db (?); - db statements; - global subroutines; - reading all Robot::Web::*

Robot::Web::Something - "Something" is name of web; - all routines regarding "Something";

Robot.pl pseudo code should look like this:

... $cfg = GetCfg(); for my $web (@{$cfg->{webs}) { Process($web); } ...

Please what do you think about this scheme? Tell some better options how to achieve this. Thanks.

Replies are listed 'Best First'.
Re: Module hierarchy help
by ptum (Priest) on Jul 27, 2006 at 13:30 UTC

    Creating a separate module for each source of your data seems overkill ... surely you could identify the common tasks and iterate across your sources, configuring a Robot::Web object differently for each source?

    In my experience, tying your code that tightly to variables outside your control is a maintenance nightmare ... better (I think) to write a general solution and configure it as necessary.


    No good deed goes unpunished. -- (attributed to) Oscar Wilde
      thanks - yes, I know this should be overkill. But I really dont know how to specify in cfg so different task as is parsing some web, or downloading - every web has other structure, so I can't use some "config" for that. I have to code for that own subroutine.

      Putting 50 download_XXX() and parse_XXX() in one file....hm I really dont know if it is good idea. Any other help ? :)

        I guess without knowing a bit more about the kind of data you seek to retrieve and how you intend to parse it, I can't give you very good advice.

        But it seems to me as though you are wanting to repeatedly execute a small handful of fairly basic tasks:

        1. Connect to a website
        2. Find some information
        3. Extract that information
        4. Format the extracted information
        5. Save to a database

        It seems to me that all of those methods could be part of your Robot::Base. Admittedly, you might have to subclass in some cases, but it would seem to me that the website URL, the regular expression(s) or other identifying tag names for locating the proper information, the formatting rules and the SQL statement handles could all be part of your configuration, so that you are executing the same code for each website with different configuration parameters.

        Heck, the best way might be to try it out for a half-dozen of the sites, and see if you can abstract the common code and come up with a solution that is general enough -- if not, then you may end up writing distinct code for each of the 50 data sources. If you end up doing it that way, though, I would hate to be the poor guy who comes along after you and has to maintain it. :)

        Fifty subclasses is beyond my limit of how granular I am willing to get ... what do you do if someone asks you to go to 100 websites, or 1000? When you find yourself writing so much code for essentially the same tasks, you have to ask yourself, "Isn't there some general solution that would save me from so much duplicate effort?". I usually ask myself that question if I have more than five or six subclasses at a particular level.


        No good deed goes unpunished. -- (attributed to) Oscar Wilde
Re: Module hierarchy help
by duc (Beadle) on Jul 27, 2006 at 13:01 UTC

    So, if I get it right, you will have 50 files under the directory Robot::Web. Ok.

    I think it is a good idea to separate routines following what they do. So maybe you could try to find the similitude between the routines for different sources and pass them arguments about the source. I think if a routine is specific to one application it should not be in a module. just be sure you don't work for nothing ;).

    To give an example, I have made some modules myself and one of them is EditFiles.pm. It contains routines to modify a file or find information in that file and I am using this module in a number of script.

    I think you have good start.

      Thanks for answer duc,

      yes, I am planning to have 50 files under dir Robot::Web. Each webpage is so different, that I _have_ to use other rutine for download, other for parsing, and ofcourse I dont want to have all that in 1 file/place. So I decided to do this.

      When some routine _could_ be used in more than 1 web (insert to db for example), I will put in Base. For example I dont know, if I should load all those modules to robot.pl, or in robot.pl will be only Base, and in Base all Robot::Web::* will be loaded. I think better is 2nd option, but never did this and dont know how to do that :(
        For that I think the second option is better. You would just have to put something like use Web::Something but I agree 50 module doing almost the same work but in different manner looks like a nightmare. I don't know how to help you on that though and I understand what you mean.