comment on

Your problem, as I see it, is twofold:

How to parse the data and store it into a Perl data structure
Your data consists of ranges... how to merge adjectant ranges into a single range

Let's start with the first part... Your idea to generate variable names out of data is a very bad one (IMnsHO). See Why it's stupid to 'use a variable as a variable name' for the reasons.

I'd rather use a single hash to store everything, something like:

%presence = ( 'hi'  => [
                        { from => 65, to => 85 },
                        { from => 86, to => 106}
                       ],
              'bye' => [
                        { from => 12, to => 32 },
                        { from => 33, to => 53 }
                       ],
             );
[download]

Instead of the 2 item hashes with "from" and "to" values, you could choose to use a 2 item array instead, which allegedly is better for resource usage (memory):

%presence = ( 'hi'  => [
                        [65, 85],
                        [86, 106]                             
                       ],              
              'bye' => [
                        [12, 32],
                        [33, 53]                             
                       ],
             );
[download]

With constants

use constant FROM => 0;
use constant TO => 1;
[download]

access code to dig into the data structure could look quite similar.

Now, how do you process the data and put it into the hash? Something like this:

my %presence;
while(<INPUT>) {
    my($name, $from, $to) = /^(\w+):\s+(\d+)\s+.*?\s+(\d+)$/ or next;
    push @{$presence{$name}}, { from => $from, to => $to };
    # or, with arrays:
    # push @{$presence{$name}}, [ $from, $to ];
}
[download]

Yes, that really is all it takes to build the data structure I showed above from your data files.

Part 2 of your problem is merging ranges that are adjectant, or possibly may overlap. For that, you'll have to loop through the collected values in the array for each name, and see if it touches any other range in the array, and if so, merge them.

You could do that with nested loops, for each item, loop through all other all already selected ranges. However, I think this could prove bugprone, you may have to loop again after each merge to see if you can't merge them even more.

Or you could use a module. Set::IntSpan::Fast looks like a good candidate, what's more: looking at its docs, it apparently internally uses the data structure I would have thought best for merging range sets. I don't know of an official name, but I'd call it a "toggle list". That would have been my 3rd suggestion. :)

The internal representation used is extremely simple: a set is represented as a list of integers. Integers in even numbered positions (0, 2, 4 etc) represent the start of a run of numbers while those in odd numbered positions represent the ends of runs. As an example the set (1, 3-7, 9, 11, 12) would be represented internally as (1, 2, 3, 8, 11, 13).

(Note: I was first introduced to this kind of representation by demerphq. Thanks for that. I understood he uses it to represent Unicode character classes in his updates to the perl5 regexp engine.)

Once you get your final IntSpan object, you could store it as is, as the value for a name in the hash; or you could convert it back to the representation I showed you above.

In reply to Re: extracting from text by bart
in thread extracting from text by heidi

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.