campus1plb has asked for the wisdom of the Perl Monks concerning the following question:
Dear all, i'm new to the site (and returning to Perl after quite a long break, but i was never a master of any sort)
I'm trying to plan out a program and i'd like to sanity check my objectives just to make sure i'm not a: making more work than i need to, and b: planning something stupid.
Task:
read a CSV file containing columns for
|Degree Subject|Entry requirement|
The entry requirement field contains strings such as:
"A minimum of 3 A Levels at ABB for First Year Entry or a minimum of AAB for Second Year Entry. Must include Mathematics and Physics at AB."
"For First Year Entry a minimum of 3 A Levels at BBB or 4 AS at AABB. For Second Year Entry a minimum of an A in the subject selected for Single Honours plus BB, or AB in the subjects selected for Joint Honours plus a further B."
"Three A Levels at ABB. AB required in Mathematics and Physics or a B in Design & Technology or a B in Engineering. If applicant presents with B in Physics, Design & Technology or Engineering, Mathematics must be A grade."
Program plan:
Read in the CSV file to an array/hash (more on this later)
Use regular expressions to determine which subjects are required for each degree subject, and create a column specific to EACH subject and mark whether it is present or not
Write this array/hash to a csv file for output.
Example output:
|Degree Subject|Entry requirement* |Grades|Maths|Physics|Engineering|etc etc
|Chemical Eng |A minimum of 3 A Levels at XXX* for First Year Entry..|ABB |A/B |A/B | |
*(use s/ in regex to indicate which parts have been "detected" for manual checking)
Problems/Puzzles:
1/ Would it be more straighforward to use Text::CSV having created the full matrix of columns manually and then assign values to the relevant fields or check for entries in the dataset and then "create" columns during runtime?
2/ My gut feeling is to use an array (instead of a hash) for this as by my nature (and as C was my first language many moons ago) it seems nice and orderly. Speed performance isn't a critical issue here.
3/ for the REGEX component, it's going to be quite complicated, and there are many variants of field here. I'm contemplating doing one of two strategies a) using a first pass to pull out all unique entries and attempt a regex on them, using this as a reference key to then screen the remainder b) doing the regex in one pass N.B there may be as many as 46,000 row entries, but in terms of unique entries it may be more like 5,000 (which is still loads but easier to check over perhaps)
Delighted for any guidance, even just pointers.
Best wishes, Phil
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: CSV regex with hash/array program plan
by roboticus (Chancellor) on Nov 23, 2014 at 17:19 UTC | |
by campus1plb (Initiate) on Nov 23, 2014 at 18:55 UTC | |
by AnomalousMonk (Archbishop) on Nov 23, 2014 at 21:57 UTC | |
by Anonymous Monk on Nov 24, 2014 at 01:16 UTC | |
|
Re: CSV regex with hash/array program plan
by GrandFather (Saint) on Nov 24, 2014 at 01:35 UTC | |
by campus1plb (Initiate) on Nov 24, 2014 at 19:42 UTC | |
by GrandFather (Saint) on Nov 24, 2014 at 20:30 UTC | |
|
Re: CSV regex with hash/array program plan
by Anonymous Monk on Nov 23, 2014 at 16:01 UTC |