in reply to Use Schwartzian transform across multiple files
I showed some working Schwartzian Transform (ST) code at Re: Use Schwartzian transform across multiple files. As a beginner, I would certainly consider the idea of a more straightforward approach. I recommend that you master the basics before trying to use advanced techniques. I show "another way" for you below.
The sort routine selects pairs of things to "judge". The user supplied function's job is to decide: less than, equal, or greater than. In the code below, there is a lot of "extra work" because a regex has to be run twice every time a new pair of "things" is selected for comparison. The ST is faster because it calculates all of the regex's only once and saves that result in an intermediate array before the actual sort is run.
However you should consider that often this extra efficiency doesn't matter at all in the overall scheme of things. In fact, for small numbers of lines, the ST can actually be slower due to the overhead of creating the intermediate array and transforming it back to the original representation.
How fast is "fast enough" depends upon the application. If you are sorting an array of say 80,000 elements, there probably will be a user noticeable difference between algorithms. With 100-200 lines, probably not.
Once you get your code working, I encourage you to benchmark the code below vs my ST version. Make the comparison as "fair as possible". Also be aware that the second time you run the program, it will run faster because the files will be in memory disk cache and that speeds things up a lot. But even so, you probably will learn something from doing a simple benchmark exercise. I don't know what OS you are using, but also be aware that on some OS'es. Windows in particular, console I/O is an extremely "expensive" operation and takes a lot of execution time. I/O to report benchmark progress can consume so much time that it skews the results.
#!/usr/bin/perl use strict; use warnings; if (!-d "sorted") { mkdir "sorted" or die "unable to create dir sorted $!"; } my @files2sort = <file*.txt>; #just use glob to get names my $curfilenum =1; foreach my $file (@files2sort) { open my $fh_in, '<', $file or die "$file failed to open $!"; open(my $fh_out, '>', "./sorted/$file.sort") or die "cannot create +out $file.sort $!"; print "Processing ".$curfilenum++." of ".@files2sort." $file\n"; sortfile2($fh_in, $fh_out); close ($fh_out); close ($fh_in); print "OK: Sorted $file \n"; } sub sortfile2 { my ($fh_in, $fh_out) = @_; my @lines = <$fh_in>; @lines = sort by_version @lines; print $fh_out @lines; #can do a sort "in place" #separate @sorted var is not needed. } sub by_version { my ($verA) = $a =~ /VerNumber:\((\d+)/i; my ($verB) = $b =~ /VerNumber:\((\d+)/i; $verA <=> $verB #returns -1,0,+1 } __END__ Processing 1 of 2 file1.txt OK: Sorted file1.txt Processing 2 of 2 file2.txt OK: Sorted file2.txt
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Use Schwartzian transform across multiple files
by Sonya777 (Novice) on Sep 20, 2016 at 12:29 UTC |