mascip has asked for the wisdom of the Perl Monks concerning the following question:
Hi all,
this message is fairly long, just read the bold text if you want to get a rough idea of what it's about.
I'm fairly new to programming and i've read a lot in the last 9 months. It's good, i've got lots of theoretical ideas. But now that i start using them i realize that it's not so simple to make design choices. As an indication, the last book that i read (and found very inspiring) is "Growing Object-Oriented Software Guided by Tests", about TDD (Test Driven Development); for those who read it you'll know what my background is.
I'm going to present you a very simple program that i'm trying to make now, and ask questions about design. I have no question about the implementation (which CPAN modules to use, etc); but i'm wondering which structure this project should have (in terms of Object and Roles, and their relashionship).
It is quite a small project, so i could just fit it all in one big dirty script. But i've decided to make it a design exercise, in order to assimilate and experience these ideas that i've read about.
I would really enjoy discussing this with other people, as i feel quite discombobulated right now : there's so many ways of doing the same thing!
With this program, i want to
- find directories with spreadsheets of interest
- read all spreadsheets from a directory simultaneously
- calculate SOME_STUFF($num_line) on each line
- analyze and display the results
As I told you, it's a fairly simple program. Maybe that's why there's many possibilities for design. I'm trying to make it as cleanly and elegantly as possible. Mostly following ideas from this TDD book i read.
A bit more information :
- i have maybe 1000 or 2000 such spreadsheets, so i wouldn't do it by hand.
- for each line, to calculate SOME_STUFF($num_line) i need information from all of the spreadsheets in one directory, and i need information from a lines around $num_line-2, $num_line-2 and $num_line+1 (in all spreadsheets from the same directory).
- The lines in each spreadsheet correspond : they are data at one point in time, and the time data is the same for all spreadsheets within a directory (obviously, i will test this).
- the spreadsheets are fairly big (1MB), so i COULD read all of the spreadsheets FIRST, and then process the results. Easy design solution. But that would take up lots of memory, and thus probably be slow. Maybe it's not that much memory in fact ??
But well, anyway, i would like to try and process the data "on the go" (while reading it) if possible, as it represents a kind of "design challenge".
I started by implementing a very simple program, to which i will add features one by one (which people call "incremental programming").
The first feature i've implemented is to read information from a spreadsheet. Then i added a few more stuff: not reading the header lines, reading only certain lines and columns, changing the name of the fields (i personnalize them).
I still haven't implemented any of the calculation stuff, but i already feel like it's time for some refactoring.
At the moment, i have a Main.pm object which does everything, i want to make it more lightweight. To create objects or roles to take some responsibilities (i would like to follow the "one responsibility per class" design principle).
I had two naive ideas on how to do this :
- create a Read::My::Spreadsheet::Files role, which would encapsulate all the sugar CSV reading methods
- create a My::Spreadsheet::Reader class, would enable me to easily return the result from each spreadsheet one by one. But that wouldn't enable me to process the data "on the go".
I've done both in fact, just to try and play. But i still don't know how it's going to fit with what i'm doing next.
The next feature i want to implement is calculating SOME_STUFF() for one particular directory.
I'm thinking of creating a My::SOME_STUFF::Calculator object (or maybe a Role???) to do the calculations.
If i had a My::Spreadsheet::Reader class, i would first read and gather all the data for a directory in Main.pm, and then calculate everything.
But if i want to do it "on the go", i don't really know how to do it. Should My::SOME_STUFF::Calculator "do" the role Read::My::Spreadsheet::Files, and thus use sugar spreadsheet reading methods to make the job easier (and more elegant)? This would mean that i would have a My::SOME_STUFF::Calculator object, which takes "messages" from a role that "bridges" with the realm of spreadsheets. Right?
Later i will have to process several directories. I'm guessing that i will search for them in Main.pm (or calculate_some_stuff.pl), and then use a My::SOME_STUFF::Calculator object for each. And finally, for analyzing and displaying results, i could put the methods in a My::SOME_STUFF::Result::Display object.
It is a simple project but it's long to explain.
Please, give me some feedback.
I guess i will learn by playing with the code and trying different things, buy asking experienced people can help a lot too. Hopefully, different ideas will get said, and an interesting discussion on design could emerge, for more people to learn together.
|
|---|