Re^2: Selecting the right data structure

As for the d(n) field, let's say this represents a date on each parent line. I want to sort all of the parent lines by date.

If lines are not immediately split, then perhaps the entire line could be used as a hash key?

Sorry for not explaining this better...

BTW,

You have concatenated the children lines to the parent line -- this is not what I'm trying to do.

The children lines should be a sub-array of their parent.

Where do you want *them* to go today?

Comment on Re^2: Selecting the right data structure

Replies are listed 'Best First'.
Re^3: Selecting the right data structure by BrowserUk (Patriarch) on Mar 02, 2007 at 23:54 UTC
If lines are not immediately split, then perhaps the entire line could be used as a hash key? If you were to use the whole string as the key to the hash, what would it buy you? It would make for problems in building the data structure because for the compound lines, you wouldn't have the entire key when you read the first line. You'd either have to employ some readahead, or delete and re-store compound lines under new keys each time you found an extension line. As for the d(n) field, let's say this represents a date on each parent line. I want to sort all of the parent lines by date. It would be possible to sort the data prior to spliting it, but if the fields are complex (like dates) then it's much easier to do the sort after the split. Without making any attempt to be efficient, sorting by field n (a more normal term for your d(n) nomenclature), can be very simple. This sorts the data by the (additional) last character of the 4th field: `#! perl -slw use strict; my @data; while( <DATA> ) { if( /^\s/ ) { push @{ $data[ -1 ] }, split; } else { push @data, [ split ]; } } print "@$_" for sort{ substr( $a->[ 3 ], -1 ) cmp substr( $b->[ 3 ], -1 ) }@data; __DATA__ a1q b1w c1e d1r e1t f1y a2u b2i c2o d2p e2a f2s a3d b3f c3g d3h e3j f3k p3 q3 r3 s3 t3 u3 a4l b4z c4x d4c e4v f4b a5n b5m c5q d5w e5e f5r p5 q5 r5 s5 t5 u5 a6t b6y c6u d6i e6o f6p` [download] Sorting by a date field is slightly more complex, but not much. I'd give an example, but as you've given dummy data, I'd have to make up the dates and I've no idea what format your data is in. If you posted an example of your real data and explained what you are actually trying to achieve, rather than all this abstract stuff, you'd doubtless get much better answers. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]

Replies are listed 'Best First'.

Re^3: Selecting the right data structure
by BrowserUk (Patriarch) on Mar 02, 2007 at 23:54 UTC

If lines are not immediately split, then perhaps the entire line could be used as a hash key?
If you were to use the whole string as the key to the hash, what would it buy you?
It would make for problems in building the data structure because for the compound lines, you wouldn't have the entire key when you read the first line. You'd either have to employ some readahead, or delete and re-store compound lines under new keys each time you found an extension line.
As for the d(n) field, let's say this represents a date on each parent line. I want to sort all of the parent lines by date.
It would be possible to sort the data prior to spliting it, but if the fields are complex (like dates) then it's much easier to do the sort after the split.

Without making any attempt to be efficient, sorting by field n (a more normal term for your d(n) nomenclature), can be very simple. This sorts the data by the (additional) last character of the 4th field:

#! perl -slw
use strict;

my @data;
while( <DATA> ) {
    if( /^\s/ ) {
        push @{ $data[ -1 ] }, split;
    }
    else {
        push @data, [ split ];
    }
}

print "@$_" for sort{
    substr( $a->[ 3 ], -1 ) cmp substr( $b->[ 3 ], -1 )
}@data;

__DATA__
a1q   b1w   c1e   d1r   e1t   f1y
a2u   b2i   c2o   d2p   e2a   f2s
a3d   b3f   c3g   d3h   e3j   f3k
   p3   q3   r3
   s3   t3   u3
a4l   b4z   c4x   d4c   e4v   f4b
a5n   b5m   c5q   d5w   e5e   f5r
   p5   q5   r5
   s5   t5   u5
a6t   b6y   c6u   d6i   e6o   f6p
[download]

Sorting by a date field is slightly more complex, but not much. I'd give an example, but as you've given dummy data, I'd have to make up the dates and I've no idea what format your data is in.

If you posted an example of your real data and explained what you are actually trying to achieve, rather than all this abstract stuff, you'd doubtless get much better answers.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

[reply]
[d/l]