Hi all, thank you for interesting, helpful answers.
I should have said what it is about before, but it wouldn’t help with my problem, nor my learning process. It’s a program for processing data from experiments. I got the data from biologists, who grow phytoplankton in controlled environment (i’ve got data from more than 100 experiments). There’s data from different captors, each of them retrieving one or more measurements. And i need to do fairly complex operations to determine parameters for mathematical models of population dynamics.
So first, i need to preprocess, in order to get “aggregated information” for each line (each line corresponds to a given moment/timestamp). Then, i will estimate parameters. But i’m only speaking about the information aggregation here. Estimating parameters will come later, and i’m not sure i’ll do it with Perl. I’ll see later, that's another question.
Now, back to my design questions :
1. I don’t make the spreadsheets myself. But i could fit all these data in a database to get help from MySQL queries. I didn’t think about this. What i don’t like about it is that
- i need to do one extra operation (organise and transfer my data to the database)
- i need to learn how to use MySQL again. It’s not complex i know, but i haven’t used it in 6 years now.
And i’m thinking that what MySQL can do, Perl can do. Am i wrong thinking this?
2. I don’t use globs for finding files, but the File::Find::Rule CPAN module. Are globs better ?
3. Thank you for inspiring comments :
- Design is about learning to recognize the most simple and cohesive parts of a problem. When the parts are smaller, everything is simple to design.
- Write tests to... test your understanding of the problem. => That’s what i discovered recently by writing my first tests before coding. Thank you for the phrasing, it was good to read.
- First, write a fast and dirty proof of concept (well, after having written a test for the feature i’m implementing) : this is the best way to learn about the problem. Then only, refactor and redesign.
4. I agree with your various comments : my design problem will be solved when i will drop it (nice phrasing, tye). I can do i all with a script. For each experiment, I could just read each of the spreadsheets, store the data into a hash, and then process it all. That IS making things easier, and thus better design.
Nonetheless, yesterday i wrote stuff down on paper and came down with a rough idea of how to do process stuff “on the fly”. I’m going to write “dirty code” here and now (not compiled and probably won’t compile - i’m on holidays without my computer), copying directly from the paper to the forum. I’m not going into the details of the implementation though: just the rough idea.
The key for “reading on the go” was to create data Readers, and then pass them all to a method that reads lines simltaneously.
The objects i will create are :
- Reader::Data, that read data with a get_next_data() method, for a given file
- Experiment, which will know the directory path for the experiment, and the paths for its data files too
- possibly some objects for representing the data, or data sets,
with methods to do some calculations on them. But that’s another story.
And no Roles needed, as i just don’t need them now.
Here is what i would have done (but won’t, thanks to what you all said), for those who would be interested :
# - - - process_bio_data_in_directory.pl
# Responsibility : process global results for all experiments.
# Where to find the experiment directories and data files
my $DATA_DIR = ‘C:/bio_data/’;
my $experiment_dir_regex = qr/Exp/;
my $data_file_regex = qr/data_/;
### Comment : i lined these three = signs, but the different font of t
+he <code> didn't leave them lined. That is very annoying,
# 1. Find all the experiment directories, and their related data files
my $list_experiments = find_experiments_and_their_data_files_in({
dir =>$DATA_DIR,
with_experiment_regex => $experim_dir_regex,
with_data_file_regex => $data_file_regex,
});
### Comment for the reader :
### an Experiment object will have a path, and a list of data_files.
### I feel that this class makes my code more readable, and i can use
+$data_file_regex here,
### and not have to think about it again.
# 2. Process the data for each experiment
my %global_results;
EXPERIMENT:
foreach my $experiment ( @{$list_experiments} ) {
my $hash_aggr_infos = calculate_aggr_info_for_experiment($experime
+nt);
process_global_results_with( $hash_aggr_infos, \%global_results );
+
}
Then, the “on the go” data processing is organized in calculate_aggr_infos_for_experiment().
# - - - calculate_aggr_infos.pm
package Calculate::AggrInfos;
# Responsibility : calculate the aggregate information for one experim
+ent.
sub calculate_aggr_infos_for_experiment {
my $experiment = shift;
# 1. Initialize Readers for each data file
my $hash_data_reader_of_file = initialize_data_file_readers_for_ex
+periment($experiment);
# 2. Calculate aggregate information
my $hash_aggr_infos = calculate_aggr_infos_with_readers( $hash_dat
+a_reader_of_file );
return $hash_aggr_infos;
} # - - - end sub calculate_aggr_infos_for_experiment()
And next, the calculate_aggr_infos_with_readers() subroutine, which reads the files in parallel and processes them.
# in the same file and package as before
sub calculate_aggr_infos_with_readers {
my $hash_data_reader_of_file = shift;
my @data_files = keys %{$hash_data_reader_of_file};
DATA:
while ( $hash_data_now = get_next_data_for_all_readers($hash_data_
+reader_of_file) ) {
# check that the time value is the same for each set of data
check_if_time_is_the_same_for_all_data($hash_data_reader_of_fi
+le);
# calculate aggregated information
my $hash_aggr_infos = calculate_aggr_infos_from_data($hash_dat
+a_now);
# it’s a bit more complex, as i need a bit of “past data histo
+ry”
# to calculate the aggregated information
} # end while (DATA)
} # - - - end sub calculate_aggr_infos_with_readers()
That’s it ! i won’t go into more detail.
Sorry for posting code that doesn’t work, and must contain many mistakes. It mut not be nice to read.
Any comments are very welcome if you had the courage to read all this. Even just on code layout, or the way i name my variables. I’d like to improve this for readability, too.
|