Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^3: Rosetta Code: Long List is Long (Updated Solutions)

by jwkrahn (Abbot)
on Dec 09, 2022 at 20:53 UTC ( [id://11148693]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Rosetta Code: Long List is Long (Updated Solutions)
in thread Rosetta Code: Long List is Long

This is one of the reasons why the Larry-Rossler is so efficient.

I think that you are thinking of the Guttman-Rosler Transform?

my @sorted = sort { $b->{id} <=> $b->{id} || $a->{foo} cmp $b->{foo} } + @unsorted; # SLOW

Comparing $b->{id} to itself will not work very well.

Replies are listed 'Best First'.
Re^4: Rosetta Code: Long List is Long (Updated Solutions)
by eyepopslikeamosquito (Archbishop) on Dec 09, 2022 at 21:17 UTC

    I think that you are thinking of the Guttman-Rosler Transform?

    Agreed. BTW I believe the GRT is generally faster than the Schwartzian Transform (this node has more detail on sorting history in Perl).

    Though fond of the GRT, I couldn't make it work for this problem because of the unusual requirement to sort descending by count yet ascending by name (it would work nicely to sort both fields ascending via the classic pack "NA*" trick). If anyone can see a way to make GRT work for this problem, please let us know.

      Perhaps add a negative sign to the numbers before sorting and remove it after?

        Thanks for the tip!

        This is bizarre, but thanks to your tip, while just trying to get a GRT solution to work, I seem to have accidentally "out-marioroy-ed" marioroy. :) I would have thought this impossible, so I've probably overlooked something, but this version runs three seconds faster on my machine than mario's astonishing dualvar solution while using slightly less memory (2,657,968K v 2,824,184K).

        Here is the source code. This is just a first cut at this approach, so further improvements may be available.

        # llil2grt.pl. Try a GRT version. # Example run: perl llil2grt.pl tt1.txt tt2.txt tt3.txt >out.txt use strict; use warnings; use feature qw{say}; # -------------------------------------------------------------------- +-- # LLiL specification # ------------------ # A LLiL-format file is a text file. # Each line consists of a lowercase name a TAB character and a non-neg +ative integer count. # That is, each line must match : ^[a-z]+\t\d+$ # For example, reading the LLiL-format files, tt1.txt containing: # camel\t42 # pearl\t94 # dromedary\t69 # and tt2.txt containing: # camel\t8 # hello\t12345 # dromedary\t1 # returns this hashref: # $hash_ret{"camel"} = 50 # $hash_ret{"dromedary"} = 70 # $hash_ret{"hello"} = 12345 # $hash_ret{"pearl"} = 94 # That is, values are added for items with the same key. # # To get the required LLiL text, you must sort the returned hashref # descending by value and insert a TAB separator: # hello\t12345 # pearl\t94 # dromedary\t70 # camel\t50 # To make testing via diff easier, we further sort ascending by name # for lines with the same value. # -------------------------------------------------------------------- +-- # Function get_properties # Read a list of LLiL-format files # Return a reference to a hash of properties sub get_properties { my $files = shift; # in: reference to a list of LLiL-format fil +es my %hash_ret; # out: reference to a hash of properties for my $fname ( @{$files} ) { open( my $fh, '<', $fname ) or die "error: open '$fname': $!"; while (<$fh>) { chomp; my ($word, $count) = split /\t/; $hash_ret{$word} += $count; } close($fh) or die "error: close '$fname': $!"; } return \%hash_ret; } # ----------------- mainline ----------------------------------------- +-- @ARGV or die "usage: $0 file...\n"; my @llil_files = @ARGV; warn "llil2grt start\n"; my $tstart1 = time; my $href = get_properties( \@llil_files ); my $tend1 = time; my $taken1 = $tend1 - $tstart1; warn "get_properties : $taken1 secs\n"; my $tstart2 = time; my @lines; while (my ($k, $v) = each %{$href}) { push @lines, pack('NA*', -$v, "$ +k\t$v") } say substr($_, 4) for sort @lines; my $tend2 = time; my $taken2 = $tend2 - $tstart2; my $taken = $tend2 - $tstart1; warn "sort + output : $taken2 secs\n"; warn "total : $taken secs\n";

        Example Run

        > perl llil2grt.pl big1.txt big2.txt big3.txt >perl2grt.tmp

        llil2grt start get_properties : 10 secs sort + output : 20 secs total : 30 secs Memory use (Windows Private Bytes): 2,657,968K

        > diff perl2d.tmp perl2grt.tmp

        Code Differences

        The only substantive differences between dualvar llil2d.pl and llil2grt.pl above are:

        my @data; while ( my ($k, $v) = each %{$href} ) { push @data, dualvar($v, $k) } for my $key ( sort { $b <=> $a } sort @data ) { say "$key\t" . (0 + $key); }
        vs:
        my @lines; while (my ($k, $v) = each %{$href}) { push @lines, pack('NA*', -$v, "$ +k\t$v") } say substr($_, 4) for sort @lines;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148693]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-03-28 16:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found