in reply to breaking a text file into a data structure -- best way?
First analyze the structure of your input, name the parts of it. This input consist of lines, each line consists three fields: a prefix, a separator and a text field. Consecutive prefixes form a block and consecutive blocks form a prefix alphabet.
Second, answer this question: is the input parsable line-by-line or you have to look around (at a certain point) to decide what-is-what? The former answer is typically resulting more efficient programs (but it is not possible for all types of input) and the latter is generally easier to code, but requires to hold more of your input in memory. (I decided to choose the line-by-line approach by storing the previous prefix only beyond the current line.)
Then constrain yourself to go through your input line-by-line and ask yourself: what are the states (or state transitions) determining what should I do?
How to map these states to relations between lines? By comparing the prefix of the current and the previous line.
What is the tool to express these relations between the lines? Alphabetical comparison. The mapping is (cf. with the previous listing):
What should I do at each state transition?
Now try to write it again and if you're stuck, come back and look at this:
use strict; use warnings; use Data::Dump qw( pp ); my $ref = [ {} ]; my $prev_prefix = ''; while (<DATA>) { my ( $prefix, $text ) = split /> ?/; if ( $prefix gt $prev_prefix ) { $ref->[-1]{$prefix} = $text; } elsif ( $prefix eq $prev_prefix ) { $ref->[-1]{$prefix} .= $text; } else { push @$ref, { $prefix => $text }; } $prev_prefix = $prefix; } pp $ref; __DATA__ a> some random text b> b> a few random b> lines b> b> of more b> random b> b> text c> some more c> c> random c> text c> a> some random text b> b> a few random b> lines b> b> of more b> random b> b> text c> some more c> c> random c> text c>
Of course this is only one approach, but the clearing of concepts, methodical thinking of the mechanical way to solve a problem always helped me.
And in general: practice and practice more. Read books, read the code of others (not just glance over, but change them, understand them), read the problems of others and try to solve them without looking at the solution posted by others.
Cheers
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: breaking a text file into a data structure -- best way?
by punkish (Priest) on Apr 10, 2010 at 00:35 UTC | |
by rubasov (Friar) on Apr 10, 2010 at 03:42 UTC |