misterperl has asked for the wisdom of the Perl Monks concerning the following question:
a-1 a-2 a-3 .. a-n
and I want to sort on the numeric part using $a <=> $b But instead I jump thru hoops using a regex to get the numbers then save them in another array, etc etc.. I'm pretty sure this can be a 1-line sort? I appreciate the wisdom of the Monks as always!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: How can I do a numeric sort on a substring?
by hippo (Archbishop) on Jun 25, 2021 at 13:41 UTC | |
The general case would be to use a Schwartzian Transform but in this simplistic case for small values of n you can just perform the extractions within the sort:
See also the FAQ: How do I sort an array by (anything)? PS. Here's the same thing but with substr:
🦛 | [reply] [d/l] [select] |
by misterperl (Friar) on Jun 25, 2021 at 14:01 UTC | |
| [reply] |
by haukex (Archbishop) on Jun 25, 2021 at 14:07 UTC | |
in case there are more than one number matching like a-1-3 ? Do you mean something like Sort::Key::Natural? See also How do I do a natural sort on an array? | [reply] |
by LanX (Saint) on Jun 25, 2021 at 16:48 UTC | |
by misterperl (Friar) on Jun 25, 2021 at 14:15 UTC | |
by misterperl (Friar) on Jun 25, 2021 at 14:09 UTC | |
by BillKSmith (Monsignor) on Jun 25, 2021 at 15:56 UTC | |
| |
by hippo (Archbishop) on Jun 25, 2021 at 15:12 UTC | |
Not specifically. The regex match is context sensitive, so the brackets around it enforce list context and then the [0] pulls out the first value of that list before the <=> gets a chance to enforce scalar context on it. Example:
Read lots more in the Context tutorial. 🦛 | [reply] [d/l] [select] |
|
Re: How can I do a numeric sort on a substring?
by haukex (Archbishop) on Jun 25, 2021 at 14:01 UTC | |
Note this question is very similar to your question How can I sort my array numerically on part of the string?, and many of the solutions from there apply here as well. | [reply] |
by misterperl (Friar) on Jun 25, 2021 at 14:11 UTC | |
| [reply] |
|
Re: How can I do a numeric sort on a substring?
by syphilis (Archbishop) on Jun 25, 2021 at 14:07 UTC | |
Here's my guess: If you want them to appear in the reverse order, instead do: Cheers, Rob | [reply] [d/l] [select] |
by misterperl (Friar) on Jun 25, 2021 at 14:13 UTC | |
| [reply] |
|
Re: How can I do a numeric sort on a substring? [Benchmark]
by kcott (Archbishop) on Jun 26, 2021 at 05:29 UTC | |
G'day misterperl, Other than a number of solutions involving modules, there were several pure Perl solutions. I was interested in how these might compare. The results I got were unexpected; I added a lot of variations (currently up to 14) but the same trend continued. I had thought that the Schwartzian and Guttman-Rosler transformations would have been the fastest, but that wasn't the case. In fact, those that performed extraction of the number, prior to comparison, within the sort, were the quickest. Given these extractions might have been done multiple times for the same data, I suspect Perl (v5.34.0) is performing some optimisation and perhaps caching results; I would be interested in results others receive using different Perl versions. In case there's concern about the number of tests, the entire benchmark should run in under a minute: a COUNT of zero indicates 3 seconds for each of the 14 tests, plus a few seconds for other overhead. Here's a rough breakdown of the results: Here's a sample run. It might be worth using the "download" link to see this without wrapping.
There was a typo in the code; see update below. The original Benchmarks: section is in the spoiler below.
Here's the code:
Notes on pack:
Update: There was a typo in my original code: the first key/value pair in %coderef_for was sr => \&sort_substr but it should have been sr => \&sort_regex. I've fixed my code locally and edited the code above to reflect this fix. I reran the code and have posted a new Benchmarks: section. All of the output before Benchmarks: is unchanged from the original posting. The old Benchmarks: section is now in a spoiler. The results are not particularly different — except sr and ss are no longer identical; although, still too close to call — and my original "rough breakdown of the results" has not changed. — Ken | [reply] [d/l] [select] |
by LanX (Saint) on Jun 26, 2021 at 13:28 UTC | |
you are only testing with 10 elements in your @unordered array. for n elements you have in best case O(n*log(n)) comparisons but O(n) packs and unpacks with ST and GRT, which means the overhead to extract n numbers will account much more for small n. use at least n >> 1000 elements for a real benchmark. Rule of thumb : the choice of algorithm is almost always neglectable for small data.
updateit's like getting the Porsche out of the garage to buy a six-pack of beer just 10m around the corner.
Cheers Rolf | [reply] [d/l] [select] |
by kcott (Archbishop) on Jun 27, 2021 at 02:41 UTC | |
G'day Rolf, "you are only testing with 10 elements in your @unordered array." The code only contains one comment which I added for the express purpose of heading off such feedback:
"use at least n >> 1000 elements for a real benchmark" Indeed. I used 10,011 elements "for improved benchmarking". The preamble tests were to check that all subroutines returned identical results. Each of the preamble tests were only run once and did not involve any benchmarking; they were intended as a sanity check (processing stopped here if --dry_run was used). The data used for this had 10 elements, which I considered sufficient for these tests, and was indicated by:
— Ken | [reply] [d/l] [select] |
by LanX (Saint) on Jun 27, 2021 at 11:56 UTC | |
by LanX (Saint) on Jun 27, 2021 at 13:46 UTC | |
| |
by Anonymous Monk on Jun 26, 2021 at 08:18 UTC | |
| [reply] |
|
Re: How can I do a numeric sort on a substring?
by AnomalousMonk (Archbishop) on Jun 25, 2021 at 16:22 UTC | |
Here's another example of the Schwartzian mentioned earlier. I've written it as a "one-liner" although that's not the form I'd prefer for production code for reasons of readability/maintainability. I've added more test values in different formats to illustrate the effects of the regex that is used. Another consideration when sorting is "stability." See the discussion of this here and in the sort pragma. See also Guttman and Rosler's A Fresh Look at Efficient Perl Sorting for lots more info on Perl sorting. Give a man a fish: <%-{-{-{-< | [reply] [d/l] [select] |
|
Re: How can I do a numeric sort on a substring?
by Marshall (Canon) on Jun 28, 2021 at 00:58 UTC | |
A simple way is shown below. $a and $b are special variables used by sort. Here the numeric value at the end of string is used as the primary thing to sort upon. If the numeric values are equal, then string comparison is used as a "tie breaker".
This will work fine performance wise for say 100 things. I would recommend keeping things simple unless there is an obvious performance reason that requires the sort to be faster.
An Update: I re-iterate my suggestion about the 10,000 items level. No, I don't present benchmarks of my own, but in my experience, that is where a very noticeable impact of say the ST will become apparent. If your program is interacting with a single user, even quite stunning performance increases might make little difference. A difference of say 20 ms is basically undetectable at the user UI level. If you are writing server level code then performance matters a lot more because it affects the number of users that can be serviced by your machine. Another update: A difference of 50 ms (1/20th of a second) will be noticeable by the user in things like audio playback of multiple files. Less than that doesn't make much difference. If your sort takes even 3ms which is a LONG time by modern computer standards, so what? Unless this a server process, I wouldn't worry about it. | [reply] [d/l] |
|
Re: How can I do a numeric sort on a substring?
by swl (Prior) on Jun 25, 2021 at 23:43 UTC | |
Use natkeysort from Sort::Key::Natural. Several of the other responses link to discussions which no doubt mention it, but this saves a bit of extra searching. Edit - and now I see this has already been explicitly mentioned in 11134278. | [reply] [d/l] |