punkish has asked for the wisdom of the Perl Monks concerning the following question:
Update0: My Best buddy tells me such class of problems are called "State Machine." Googling for "Perl state machine" returns a bunch of hits that I am now in the process of digesting. In the meantime, I look to your help.Update1: Seems like http://www.perl.com/pub/a/2004/09/23/fsms.html might have the answer for me.
I have a longish text file like below. The gutter annotation is not a part of the text file, but only to aid my question.
a> some random text ---------------- b> b> a few random b> lines b> b> of more b> random b> b> text **************** c> some more c> c> random c> text c> a> some random text ---------------- b> b> a few random b> lines b> b> of more b> random b> b> text **************** c> some more c> c> random c> text c>
I want to split the file into an array of hashes like so
@foo = ( { a => 'some random text' b => ' a few random lines of more random text ' c => 'some more random text ' }, { a => 'some random text' b => ' a few random lines of more random text ' c => 'some more random text ' }, .. and so on .. );
In other words, each hash is made up of the snippet of text starting from the line that is followed by '--------------' up to, but not including, the next line that is followed by '--------------'.
I have two questions -- one, how do I do the above? I have been hitting my head against a wall the entire day yesterday, so I come to you today. I have nothing to show you because I everything I did was wrong. My approach was mostly to start from the beginning and go to the end, trying to keep flags on when one hash element began and when it ended, and so on. Which brings me to my second question.
What is the canonical design pattern for such a problem? I come across such problems all the time, and I always slow down in trying to solve them. A pattern that is visible to the eye becomes very difficult to program. Yesterday I had another such problem which I managed to solve, if I may say so myself, rather innovatively. The text file looked like so
bri red grn blu 0 0 0 0 1 0 0 0 2 0 0 0 .. 99 0 0 0 100 0 255 255 101 0 250 255 102 0 246 255 ..
The above had to be converted to
CLASS EXPRESSION ([pixel] >= 242 AND [pixel] <= 242 STYLE COLOR 200 72 127 END END CLASS EXPRESSION ([pixel] >= 175 AND [pixel] <= 175 STYLE COLOR 191 236 0 END END ..
That is, group the brightness values by color triplets. After struggling with it for a while with the usual, line by line, flag as you go approach, I decided to turn the color triplets into keys of a hash. The problem was solved in a couple of lines, and elegantly. Here is the code for that
while (<INFILE>) { # remove leading whitespace & newline from end # and split the row on whitespace my @r = chomp && s/^\s+// && split /\s+/; # create a key in lut hash using rgb vals push @{$lut{"$r[1].$r[2].$r[3]"}}, $r[0]; } while (my ($k, $v) = each %lut) { $k =~ s/\./ /g; # replace . in hash key with space my @v = sort @$v; # sort the color brightness array to get # min/max values print OUTFILE "CLASS\n" . " EXPRESSION ([pixel] >= $v[0] AND [pixel] <= $v[$#v]\n" . " STYLE\n" . " COLOR $k\n" . " END\n" . "END\n"; }
I was able to solve above because of the uniqueness requirement, else it would have been the usual slog. So, is there a generic approach to this? And, is there a way I can validate the output... ensure that the output is what I really want, given very long input text files?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: breaking a text file into a data structure -- best way?
by sierpinski (Chaplain) on Apr 09, 2010 at 14:52 UTC | |
|
Re: breaking a text file into a data structure -- best way?
by rubasov (Friar) on Apr 09, 2010 at 16:59 UTC | |
by punkish (Priest) on Apr 10, 2010 at 00:35 UTC | |
by rubasov (Friar) on Apr 10, 2010 at 03:42 UTC | |
|
Re: breaking a text file into a data structure -- best way?
by ikegami (Patriarch) on Apr 10, 2010 at 04:18 UTC | |
|
Re: breaking a text file into a data structure -- best way?
by repellent (Priest) on Apr 10, 2010 at 05:56 UTC | |
|
Re: breaking a text file into a data structure -- best way?
by rubasov (Friar) on Apr 10, 2010 at 15:51 UTC |