- If lines are not immediately split, then perhaps the entire line could be used as a hash key?
If you were to use the whole string as the key to the hash, what would it buy you?
It would make for problems in building the data structure because for the compound lines, you wouldn't have the entire key when you read the first line. You'd either have to employ some readahead, or delete and re-store compound lines under new keys each time you found an extension line.
- As for the d(n) field, let's say this represents a date on each parent line. I want to sort all of the parent lines by date.
It would be possible to sort the data prior to spliting it, but if the fields are complex (like dates) then it's much easier to do the sort after the split.
Without making any attempt to be efficient, sorting by field n (a more normal term for your d(n) nomenclature), can be very simple. This sorts the data by the (additional) last character of the 4th field:
#! perl -slw
use strict;
my @data;
while( <DATA> ) {
if( /^\s/ ) {
push @{ $data[ -1 ] }, split;
}
else {
push @data, [ split ];
}
}
print "@$_" for sort{
substr( $a->[ 3 ], -1 ) cmp substr( $b->[ 3 ], -1 )
}@data;
__DATA__
a1q b1w c1e d1r e1t f1y
a2u b2i c2o d2p e2a f2s
a3d b3f c3g d3h e3j f3k
p3 q3 r3
s3 t3 u3
a4l b4z c4x d4c e4v f4b
a5n b5m c5q d5w e5e f5r
p5 q5 r5
s5 t5 u5
a6t b6y c6u d6i e6o f6p
Sorting by a date field is slightly more complex, but not much. I'd give an example, but as you've given dummy data, I'd have to make up the dates and I've no idea what format your data is in.
If you posted an example of your real data and explained what you are actually trying to achieve, rather than all this abstract stuff, you'd doubtless get much better answers.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |