Cristoforo has asked for the wisdom of the Perl Monks concerning the following question:

I wonder if someone could tell me why this code is not working. It is to sort the records from highest percent to lowest. (I edited the regexp in split) Found one error. the mao{$_[0]} should be map {$_->[0]} and the split needed a newline (\n) at the end of the pattern split(/(?<=workspace\/data)\n/, $s) Now, I'm getting the correct sorted output.
C:\Old_Data\perlp>perl try3.pl >>> prd1702 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 746G 3.1T 23% /wor +kspace/data >>> prd1703 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 687G 3.2T 18% /wor +kspace/data >>> prd1701 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 887G 3.0T 13% /wor +kspace/data
(Below, the code before fixes noted above)
#!/usr/bin/perl use strict; use warnings; use feature 'say'; #https://stackoverflow.com/questions/79472778/sorting-the-content-of-a +-file my $s = <<EOF; >>> prd1701 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 887G 3.0T 13% /wor +kspace/data >>> prd1702 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 746G 3.1T 23% /wor +kspace/data >>> prd1703 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 687G 3.2T 18% /wor +kspace/data EOF my @data = map {$_[0]} sort {$b->[1] <=> $a->[1]} map {[$_, /\s(\d+)%/]} split(/(?<=workspace\/data)/, $s);
It is printing error as:
C:\Old_Data\perlp>perl try3.pl Use of uninitialized value in numeric comparison (<=>) at try3.pl line + 23. Use of uninitialized value in numeric comparison (<=>) at try3.pl line + 23.

Replies are listed 'Best First'.
Re: schwartzian transform problem - Solved
by johngg (Canon) on Feb 28, 2025 at 11:38 UTC

    An alternative to the ST is a GRT.

    In a do block read the data with no line buffering from a filehandle (in this script a HEREDOC) and split into records at points not preceded by start of string (to avoid an empty first record) and followed by the ">>>" which starts each record. Each record passes into a map where the digits preceding the % sign are captured then packed as a 32-bit network order value (logical NOT applied as we want descending numerical order) concatenated with the whole record packed as a string. This is then passed to a simple lexical sort and then into a second map which unpacks the record by skipping the first four bytes which is the number used to sort. The script ...

    use strict; use warnings; open my $fh, q{<}, \ <<__EOF__ or die qq{open: < HEREDOC: $!\n}; >>> prd1703 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 687G 3.2T 18% /wor +kspace/data >>> prd1701 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 887G 3.0T 13% /wor +kspace/data >>> prd1702 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 746G 3.1T 23% /wor +kspace/data __EOF__ print for map { unpack q{x4a*}, $_ } sort map { m{(\d+)(?=%)} && ( ~ pack( q{N}, $1 ) . pack( q{a*}, $_ ) ) } do { local $/ = q{}; split m{(?<!\A)(?=>>>)}, <$fh>; }; close $fh or die qq{close: < HEREDOC: $!\n};

    The output ...

    >>> prd1702 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 746G 3.1T 23% /wor +kspace/data >>> prd1703 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 687G 3.2T 18% /wor +kspace/data >>> prd1701 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 887G 3.0T 13% /wor +kspace/data

    I hope this is of interest.

    Cheers,

    JohnGG

      Hi JohnGG, I'm trying to follow your solution using the GRT sort. I wonder why you have m{(\d+)(?=%)} where I might have used m{(\d+)%} without the positive lookahead for '%'?

        There's no difference since $& and others aren't used, but /(\d+)%/ should be faster.

Re: schwartzian transform problem
by ikegami (Patriarch) on Feb 28, 2025 at 23:03 UTC

    Instead of using complex code to make sort efficient, use Sort::Key which handles that for you without the messy code.

    use File::Slurper qw( read_text ); use Sort::Key qw( rikeysort ); print rikeysort { ( /(\d+)%/ )[0] } split /^(?=>>> )/m, read_text( 'try3.txt' );

    (Also note the more reliable split pattern.)

    Features:

    • Simplicity.

    • It's a stable sort, meaning it keeps the relative order of already-sorted items.[1]

    • And it might be the fastest of all provided solutions.

      And it's also the fastest of all the provided solutions!


    1. The builtin-sort on modern versions of Perl is a stable sort. And this means the provided ST solution is a stable sort on modern version of Perl. But the provided GRT solution isn't a stable sort.
      This is a situation where there's no substitute for measuring, e.g. with Benchmark.
        I ran tests on the GRT and ST on 90_000 lines and they both ran in less than one second. The GRT ran in 1/5 second and the ST ran in 1/15 second
Re: schwartzian transform problem
by Cristoforo (Curate) on Feb 27, 2025 at 20:12 UTC
    The program with the corrections (with file try3.txt same data):
    #!/usr/bin/perl use strict; use warnings; use feature 'say'; open my $fh, '<', 'try3.txt' or die $!; # try3.txt contains the data my $s; { local $/ = undef; $s = <$fh>; # slurp file } close $fh or die "read file close error: $!"; my @data = map {$_->[0]} sort {$b->[1] <=> $a->[1]} map {[$_, /(\d+)%/]} split(/(?<=workspace\/data)\n/, $s); say for @data;
Re: schwartzian transform problem - Solved
by Anonymous Monk on Mar 25, 2025 at 09:20 UTC
    strawberry-perl-5.16.3.1-64bit-portable @ 46.90/s strawberry-perl-5.18.4.1-64bit-portable @ 45.66/s strawberry-perl-5.20.3.3-64bit-portable @ 42.60/s strawberry-perl-5.22.3.1-64bit-portable @ 45.56/s strawberry-perl-5.26.3.1-64bit-portable @ 52.31/s strawberry-perl-5.28.2.1-64bit-portable @ 50.53/s strawberry-perl-5.30.2.1-64bit-portable @ 49.05/s strawberry-perl-5.32.1.1-64bit-PDL @ 47.72/s strawberry-perl-5.38.0.1-64bit-PDL @ 43.94/s strawberry-perl-5.40.0.1-64bit-PDL @ 41.68/s

    FWIW, I benchmarked the solutions + my own, then feeling guilty of still sitting with 5.32, re-ran under 5.40, got ~25% speed drop; though it's less (above is a collage of dir/b + script output) for simplified test below. Isn't it too much?

    use strict; use warnings; use Benchmark 'timethis'; my $s = do { local $/; <DATA> }; $s x= 10_000; timethis -5, sub { my @a = split /^(?=>>> )/m, $s; return }, $^V; __DATA__ >>> prd1701 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 887G 3.0T 13% /wor +kspace/data >>> prd1702 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 746G 3.1T 23% /wor +kspace/data >>> prd1703 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 687G 3.2T 18% /wor +kspace/data