monkprentice has asked for the wisdom of the Perl Monks concerning the following question:

Hi, this time I don't need so much as a problem solved, as rather just an advice about a more elegant solution. The scenario is: I read a file which contains likes of the following pattern:
2 [...] [line1] [line2] [line3] 3 [...] [line1] 1 [...] [line1] [line2] 2 [...] [line1] [line2]

The output code should be a simple reordering, while keeping the lineXs under each numbered indicator-line.

1 [...] [line1] [line2] 2 [...] [line1] [line2] [line3] 2 [...] [line1] [line2] 3 [...] [line1]

(The ordering of the "2 ..." lines does not matter).

I currently do it by maintaining as hash structure which maps {number-indicator => [indicator-line, sub-lines, indicator-line, sub-lines, ...]} And then write this hash structure back to the file.

The key problem is how to read one block of those lines (indicator line + all following lines until the next indicator or EOF). I do this in a loop but it looks quite clumsy that way. Is there an elegant way to use some sort of grep/map/random magic functions to do this ?

(I cannot even write an example of what I tried because I cannot think of anything good :D)

Consider this item as more of a puzzle.

Thanks!

Replies are listed 'Best First'.
Re: Looking for elegance
by johngg (Canon) on Jan 27, 2014 at 10:41 UTC

    If the file is not too large to read into memory you can slurp the whole thing and then split into "records" at points preceded by a line terminator and followed by a digit. You can then sort using a Scwhartzian Transform.

    $ perl -Mstrict -Mwarnings -E ' open my $inFH, q{<}, \ <<EOD or die $!; 2 [...] [line1] [line2] [line3] 3 [...] [line1] 1 [...] [line1] [line2] 2 [...] [line1] [line2] EOD my $input = do { local $/; <$inFH>; }; close $inFH or die $!; print for map { $_->[ 0 ] } sort { $a->[ 1 ] <=> $b->[ 1 ] } map { [ $_, m{\A(\d+)} ] } split m{(?<=\n)(?=\d)}, $input;' 1 [...] [line1] [line2] 2 [...] [line1] [line2] [line3] 2 [...] [line1] [line2] 3 [...] [line1] $

    I hope this is helpful.

    Cheers,

    JohnGG

Re: Looking for elegance
by hdb (Monsignor) on Jan 27, 2014 at 13:27 UTC

    I prefer a line-by-line approach and create a new array each time a line starting with a number is encountered. I assume that the data fits into memory, otherwise sorting will be difficult:

    use strict; use warnings; my @lines; while(<DATA>){ push @lines, [$1] if /^(\d+)/; push @{$lines[-1]}, $_; } print @$_[1..@$_-1] for sort { $a->[0] <=> $b->[0] } @lines; __DATA__ 2 [...] [line1] [line2] [line3] 3 [...] [line1] 1 [...] [line1] [line2] 2 [...] [line1] [line2]
    Update: @lines should really be called @blocks
Re: Looking for elegance
by kcott (Archbishop) on Jan 28, 2014 at 00:14 UTC

    G'day monkprentice,

    Here's another way to do it:

    #!/usr/bin/env perl use strict; use warnings; my @data; map { /^\d+/ ? push(@data, $_) : ($data[-1] .= $_) } <DATA>; print sort { ($a =~ /^(\d+)/)[0] <=> ($b =~ /^(\d+)/)[0] } @data; __DATA__ 2 [...] [line1] [line2] [line3] 3 [...] [line1] 1 [...] [line1] [line2] 2 [...] [line1] [line2]

    Output:

    1 [...] [line1] [line2] 2 [...] [line1] [line2] [line3] 2 [...] [line1] [line2] 3 [...] [line1]

    As well as looking at elegance, you might want to consider comparing whatever solutions are suggested for efficiency (with Benchmark).

    -- Ken