schwartzian transform problem

Cristoforo has asked for the wisdom of the Perl Monks concerning the following question:

I wonder if someone could tell me why this code is not working. It is to sort the records from highest percent to lowest. (I edited the regexp in split) Found one error. the mao{$_[0]} should be map {$_->[0]} and the split needed a newline (\n) at the end of the pattern split(/(?<=workspace\/data)\n/, $s) Now, I'm getting the correct sorted output.

C:\Old_Data\perlp>perl try3.pl
>>> prd1702
Filesystem                                  Size  Used Avail Use% Moun
+ted on
/workspace                                  3.9T  746G  3.1T  23% /wor
+kspace/data
>>> prd1703
Filesystem                                  Size  Used Avail Use% Moun
+ted on
/workspace                                  3.9T  687G  3.2T  18% /wor
+kspace/data
>>> prd1701
Filesystem                                  Size  Used Avail Use% Moun
+ted on
/workspace                                  3.9T  887G  3.0T  13% /wor
+kspace/data
[download]

(Below, the code before fixes noted above)

#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';

#https://stackoverflow.com/questions/79472778/sorting-the-content-of-a
+-file

my $s = <<EOF;
>>> prd1701
Filesystem                                  Size  Used Avail Use% Moun
+ted on
/workspace                                  3.9T  887G  3.0T  13% /wor
+kspace/data
>>> prd1702
Filesystem                                  Size  Used Avail Use% Moun
+ted on
/workspace                                  3.9T  746G  3.1T  23% /wor
+kspace/data
>>> prd1703
Filesystem                                  Size  Used Avail Use% Moun
+ted on
/workspace                                  3.9T  687G  3.2T  18% /wor
+kspace/data
EOF


my @data = map {$_[0]}
          sort {$b->[1] <=> $a->[1]}
           map {[$_, /\s(\d+)%/]} split(/(?<=workspace\/data)/, $s);
[download]

It is printing error as:

C:\Old_Data\perlp>perl try3.pl
Use of uninitialized value in numeric comparison (<=>) at try3.pl line
+ 23.
Use of uninitialized value in numeric comparison (<=>) at try3.pl line
+ 23.
[download]

Comment on schwartzian transform problem - Solved Select or Download Code

Replies are listed 'Best First'.
Re: schwartzian transform problem - Solved by johngg (Canon) on Feb 28, 2025 at 11:38 UTC
An alternative to the ST is a GRT. In a do block read the data with no line buffering from a filehandle (in this script a HEREDOC) and split into records at points not preceded by start of string (to avoid an empty first record) and followed by the ">>>" which starts each record. Each record passes into a map where the digits preceding the % sign are captured then packed as a 32-bit network order value (logical NOT applied as we want descending numerical order) concatenated with the whole record packed as a string. This is then passed to a simple lexical sort and then into a second map which unpacks the record by skipping the first four bytes which is the number used to sort. The script ... use strict; use warnings; open my $fh, q{<}, \ <<__EOF__ or die qq{open: < HEREDOC: $!\n}; >>> prd1703 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 687G 3.2T 18% /wor +kspace/data >>> prd1701 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 887G 3.0T 13% /wor +kspace/data >>> prd1702 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 746G 3.1T 23% /wor +kspace/data __EOF__ print for map { unpack q{x4a}, $_ } sort map { m{(\d+)(?=%)} && ( ~ pack( q{N}, $1 ) . pack( q{a}, $_ ) ) } do { local $/ = q{}; split m{(?<!\A)(?=>>>)}, <$fh>; }; close $fh or die qq{close: < HEREDOC: $!\n}; [download] The output ... `>>> prd1702 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 746G 3.1T 23% /wor +kspace/data >>> prd1703 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 687G 3.2T 18% /wor +kspace/data >>> prd1701 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 887G 3.0T 13% /wor +kspace/data` [download] I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]
Re^2: schwartzian transform problem - Solved by Cristoforo (Curate) on Feb 28, 2025 at 19:14 UTC
Hi JohnGG, I'm trying to follow your solution using the GRT sort. I wonder why you have `m{(\d+)(?=%)}` where I might have used `m{(\d+)%}` without the positive lookahead for '%'?	[reply] [d/l] [select]
Re^3: schwartzian transform problem - Solved by ikegami (Patriarch) on Mar 01, 2025 at 03:44 UTC
There's no difference since `$&` and others aren't used, but `/(\d+)%/` should be faster.	[reply] [d/l] [select]
Re: schwartzian transform problem by ikegami (Patriarch) on Feb 28, 2025 at 23:03 UTC
Instead of using complex code to make `sort` efficient, use Sort::Key which handles that for you without the messy code. `use File::Slurper qw( read_text ); use Sort::Key qw( rikeysort ); print rikeysort { ( /(\d+)%/ )[0] } split /^(?=>>> )/m, read_text( 'try3.txt' );` [download] (Also note the more reliable split pattern.) Features: Simplicity. It's a stable sort, meaning it keeps the relative order of already-sorted items.^[1] ~~And it might be the fastest of all provided solutions.~~ And it's also the fastest of all the provided solutions! The builtin-sort on modern versions of Perl is a stable sort. And this means the provided ST solution is a stable sort on modern version of Perl. But the provided GRT solution isn't a stable sort.	[reply] [d/l] [select]
Re^2: schwartzian transform problem by etj (Priest) on Mar 01, 2025 at 10:57 UTC
This is a situation where there's no substitute for measuring, e.g. with Benchmark.	[reply]
Re^3: schwartzian transform problem by Cristoforo (Curate) on Mar 21, 2025 at 17:59 UTC
I ran tests on the GRT and ST on 90_000 lines and they both ran in less than one second. The GRT ran in 1/5 second and the ST ran in 1/15 second	[reply]
Re^4: schwartzian transform problem by choroba (Cardinal) on Mar 21, 2025 at 19:08 UTC
Re^5: schwartzian transform problem by Cristoforo (Curate) on Mar 22, 2025 at 14:36 UTC
Some notes below your chosen depth have not been shown here
Re^3: schwartzian transform problem by ikegami (Patriarch) on Mar 25, 2025 at 23:16 UTC
done	[reply]
Re: schwartzian transform problem by Cristoforo (Curate) on Feb 27, 2025 at 20:12 UTC
The program with the corrections (with file `try3.txt` same data): `#!/usr/bin/perl use strict; use warnings; use feature 'say'; open my $fh, '<', 'try3.txt' or die $!; # try3.txt contains the data my $s; { local $/ = undef; $s = <$fh>; # slurp file } close $fh or die "read file close error: $!"; my @data = map {$_->[0]} sort {$b->[1] <=> $a->[1]} map {[$_, /(\d+)%/]} split(/(?<=workspace\/data)\n/, $s); say for @data;` [download]	[reply] [d/l] [select]
Re: schwartzian transform problem - Solved by Anonymous Monk on Mar 25, 2025 at 09:20 UTC
`strawberry-perl-5.16.3.1-64bit-portable @ 46.90/s strawberry-perl-5.18.4.1-64bit-portable @ 45.66/s strawberry-perl-5.20.3.3-64bit-portable @ 42.60/s strawberry-perl-5.22.3.1-64bit-portable @ 45.56/s strawberry-perl-5.26.3.1-64bit-portable @ 52.31/s strawberry-perl-5.28.2.1-64bit-portable @ 50.53/s strawberry-perl-5.30.2.1-64bit-portable @ 49.05/s strawberry-perl-5.32.1.1-64bit-PDL @ 47.72/s strawberry-perl-5.38.0.1-64bit-PDL @ 43.94/s strawberry-perl-5.40.0.1-64bit-PDL @ 41.68/s` [download] FWIW, I benchmarked the solutions + my own, then feeling guilty of still sitting with 5.32, re-ran under 5.40, got ~25% speed drop; though it's less (above is a collage of `dir/b` + script output) for simplified test below. Isn't it too much? `use strict; use warnings; use Benchmark 'timethis'; my $s = do { local $/; <DATA> }; $s x= 10_000; timethis -5, sub { my @a = split /^(?=>>> )/m, $s; return }, $^V; __DATA__ >>> prd1701 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 887G 3.0T 13% /wor +kspace/data >>> prd1702 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 746G 3.1T 23% /wor +kspace/data >>> prd1703 Filesystem Size Used Avail Use% Moun +ted on /workspace 3.9T 687G 3.2T 18% /wor +kspace/data` [download]	[reply] [d/l] [select]