heidi has asked for the wisdom of the Perl Monks concerning the following question:
how do i go about it ?hi: 65 abcdefghijklmnopqrst 85 bye: 12 bcdefghijklmnopqrstu 32 hi: 86 sagfsdgsgwsehbbdgops 106 bye: 33 afasdfdfafasaafadfad 53 i just want to store the digits only as $hi_from=65 and $hi_to=106 $bye_from=12 and $bye_to=53
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: extracting from text
by bart (Canon) on Dec 10, 2007 at 11:24 UTC | |
Let's start with the first part... Your idea to generate variable names out of data is a very bad one (IMnsHO). See Why it's stupid to 'use a variable as a variable name' for the reasons. I'd rather use a single hash to store everything, something like: Instead of the 2 item hashes with "from" and "to" values, you could choose to use a 2 item array instead, which allegedly is better for resource usage (memory): With constants access code to dig into the data structure could look quite similar. Now, how do you process the data and put it into the hash? Something like this: Yes, that really is all it takes to build the data structure I showed above from your data files. Part 2 of your problem is merging ranges that are adjectant, or possibly may overlap. For that, you'll have to loop through the collected values in the array for each name, and see if it touches any other range in the array, and if so, merge them. You could do that with nested loops, for each item, loop through all other all already selected ranges. However, I think this could prove bugprone, you may have to loop again after each merge to see if you can't merge them even more. Or you could use a module. Set::IntSpan::Fast looks like a good candidate, what's more: looking at its docs, it apparently internally uses the data structure I would have thought best for merging range sets. I don't know of an official name, but I'd call it a "toggle list". That would have been my 3rd suggestion. :) The internal representation used is extremely simple: a set is represented as a list of integers. Integers in even numbered positions (0, 2, 4 etc) represent the start of a run of numbers while those in odd numbered positions represent the ends of runs. As an example the set (1, 3-7, 9, 11, 12) would be represented internally as (1, 2, 3, 8, 11, 13). (Note: I was first introduced to this kind of representation by demerphq. Thanks for that. I understood he uses it to represent Unicode character classes in his updates to the perl5 regexp engine.) Once you get your final IntSpan object, you could store it as is, as the value for a name in the hash; or you could convert it back to the representation I showed you above. | [reply] [d/l] [select] |
by oha (Friar) on Dec 10, 2007 at 13:44 UTC | |
<Reveal this spoiler or all in this thread>
Oha | [reply] [d/l] |
|
Re: extracting from text
by johngg (Canon) on Dec 10, 2007 at 11:48 UTC | |
Here's the output (with Data::Dumper output included so you can see the data structure)
I hope this is useful. Cheers, JohnGG | [reply] [d/l] [select] |
|
Re: extracting from text
by misc (Friar) on Dec 10, 2007 at 11:28 UTC | |
It would be possible to do some cool constructs with map, hashes, ... However, if you have to change something later, something like this is much more easy to understand and debug. (Although unelegant ). It's also a matter of the needs you have.
HTH, Michael | [reply] [d/l] |
|
Re: extracting from text
by mwah (Hermit) on Dec 10, 2007 at 10:44 UTC | |
I'd save an array of values under the hash key and sort afterwards (looks not very elegant but works):
displays here:
Addendum: after reading the other posts I'd think one could also beam the values into predefined my-Variables by an eval. Im not sure if (w/error handling added) this would be that bad:
Regards mwa | [reply] [d/l] [select] |
|
Re: extracting from text
by narainhere (Monk) on Dec 10, 2007 at 11:21 UTC | |
First I am reading from the file and creating a flat-string of the file contents.Then I go about doing a lot of regexe's to ensure we get what we want!! Sorry if you find the regexe's difficult.It will help you learn a lot of things about regexe's, if you can decipher it!! The world is so big for any individual to conquer | [reply] [d/l] |
|
Re: extracting from text
by poolpi (Hermit) on Dec 10, 2007 at 11:46 UTC | |
output : HTH PooLpi | [reply] [d/l] [select] |