Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
[ 'Title', 'Author', 'Link', '', 'Title', ... ]
Any ideas how I go through the array, putting things[ { Title => 'Title Data', Author => 'Author Data', Link => 'Link Data' }, etc. ]
|
|---|
| Replies are listed 'Best First'. | |||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Re: Sort this data (alternate method)
by extremely (Priest) on Nov 19, 2000 at 11:42 UTC | |||||||||||||||||||||||||||||
I should probably have added this to the other post but I had just edited it like 5 times. =P -- | [reply] [d/l] | ||||||||||||||||||||||||||||
|
Re: Sort this data
by extremely (Priest) on Nov 19, 2000 at 11:31 UTC | |||||||||||||||||||||||||||||
tested now =):
-- | [reply] [d/l] | ||||||||||||||||||||||||||||
by japhy (Canon) on Nov 19, 2000 at 19:35 UTC | |||||||||||||||||||||||||||||
because then @hashrefs would be in reverse!" Yes, that's absolutely right. And it might be inefficient to reverse() the array when we're done with it, and it's also inefficient to keep unshift()ing to the array. So what possible efficient solution could I come up with to combine the splice() speed with the insertion speed? Pre-extend the array! (Dun, dun, DUN!) What a sneaky trick I've done. UpdateOops, used -4 above, when I meant -3. Thanks, jcwren.Updatesplice() will wig out when the array is empty. The while loop has been adjusted.japhy -- Perl and Regex Hacker | [reply] [d/l] [select] | ||||||||||||||||||||||||||||
by tilly (Archbishop) on Nov 19, 2000 at 20:42 UTC | |||||||||||||||||||||||||||||
This might be faster than your sneaky trick. It might be slower. It certainly has fewer indices. Also the cost of reverse is overstated. You have just walked through a list of n things in Perl. You then want to reverse a list of n/4 things. What is the relative cost of those two operations? Right. Pick up good material on optimization. Such as this sample chapter from Code Complete. Or RE: Efficient Perl Programming. You will find that experienced people understand that getting maintainable code with good algorithms can result in better overall speed wins than trying to optimize every line. Now noticing the splice, that matters. If it isn't optimized then that is an order(n) operation n times - which is n^2 and therefore is likely to be slow. But one reverse at the end is an order n operation once. Should the body of the loop be slightly more efficient from doing the slice rather than repeated manipulation of indices (something I would have to benchmark to have a feeling for either way) then your attempt to optimize would actually lose. To summarize, don't worry about slow operations, worry about bad algorithms. A slow operation inside a loop may matter. A slow operation outside a loop which speeds up the loop can go either way. An order n (or worse) operation inside a loop - that is the only one which should cause you to want to care up front about optimizing the structure of the code!
EDIT | [reply] [d/l] | ||||||||||||||||||||||||||||
by jcwren (Prior) on Nov 19, 2000 at 19:55 UTC | |||||||||||||||||||||||||||||
It would be interesting to benchmark if using unshift() into the $hashrefs array would be more or less efficient than messing with $i. If nothing else, it would be two lines shorter, and more Perlish. And perhaps this is too early in the morning, but why are you splicing by -4, and then popping @data? It's Sunday morning where I am, and I'm too lazy to try to write a coherent benchmark... --Chris | [reply] [d/l] [select] | ||||||||||||||||||||||||||||
by japhy (Canon) on Nov 19, 2000 at 20:04 UTC | |||||||||||||||||||||||||||||
by extremely (Priest) on Nov 20, 2000 at 01:47 UTC | |||||||||||||||||||||||||||||
Also, what if fields are allowed to be null? If so, you HAVE to read from the front... -- | [reply] | ||||||||||||||||||||||||||||
by japhy (Canon) on Nov 20, 2000 at 01:52 UTC | |||||||||||||||||||||||||||||
|
Re: Sort this data
by japhy (Canon) on Nov 19, 2000 at 23:04 UTC | |||||||||||||||||||||||||||||
UPDATEThere was a huge error in this test (and I'm stupid for not using strict in it). I was testing @c where I should have been testing @a. I am now going to replace the bad results with the GOOD results.Here's a benchmark. Six methods were tested, and run for 5 seconds, three times; once on a set of 100 elements, once on 1000, and once on 10000.
japhy -- Perl and Regex Hacker | [reply] [d/l] | ||||||||||||||||||||||||||||
by japhy (Canon) on Nov 19, 2000 at 23:59 UTC | |||||||||||||||||||||||||||||
UPDATEThese results were wrong. I'll get correct ones later.Those benchmarks were under 5.005_02. Here are the 5.6 results. They're for different sizes of the list, since the test hung horribly at 10,000 elements.
japhy -- Perl and Regex Hacker | [reply] | ||||||||||||||||||||||||||||
by japhy (Canon) on Nov 20, 2000 at 01:35 UTC | |||||||||||||||||||||||||||||
japhy -- Perl and Regex Hacker | [reply] | ||||||||||||||||||||||||||||
|
japhy looks at av.c (not av.h)
by japhy (Canon) on Nov 20, 2000 at 00:46 UTC | |||||||||||||||||||||||||||||
UPDATEAs per my revelations in sort this data, here is a bit of an adjusted report on the av.c source.Here's the relevant portion from av.c in the 5.6 source: Ok, so unshift() has to do a lot of sliding the elements up if there's not already some free space generated by a call to shift(). What shift() does is simply slide the pointer up one notch, which is very fast (which provides free space for unshift() later). It stores the value in *AvARRAY(av) into retval, and then sets *AvARRAY(av) to the C equivalent of Perl's undef value. Then it moves the pointer that marks the beginning of the array to AvARRAY(av) + 1. It also subtracts one from the max subscript and the length of the array. Then it returns retval. <revelation>I'm beginning to grok the source code! Look out!</revelation>. japhy -- Perl and Regex Hacker | [reply] [d/l] | ||||||||||||||||||||||||||||
by tilly (Archbishop) on Nov 20, 2000 at 02:12 UTC | |||||||||||||||||||||||||||||
For those who are not following the code, the logic here is based on first trying to allocate elements which there is room for (this is the "if (i)" bit). If it cannot get them all in it then makes sure it has enough space, does some accounting, copies everything over, then inserts some new stuff. Note that japhy's comment about unused space is misleading, he means unused at the beginning of the array. There is also unused stuff at the other end, but we cannot directly use that. For full details you have to also look at av.h. The call to AvMAX is just a macro to set xav_max which is the largest element which there is space for (adding to the front increases that), and AvFILL is setting xav_fill which is how many you have (adding obviously increases that as well). The call to av_extend is where the array size increases. What it does is extends the definition of what the array is occupying. In fact it is in a buffer whose size is allocated in powers of two. If the extend takes the array beyond the size of the buffer, then a new buffer is allocated and the array is moved. If it does not then the array is left where it is. Now it is clear how to make repeated calls to unshift fairly efficient. Right now it moves stuff by the minimum necessary. What we need is to have a new variable for how much to move it. That variable should be the maximum of num and the length of the array. This will cause space wastage, but it will also mean that when you hit an unshift and it has to move the array, it will not hit that copying logic again right away. Even so building up an array using n calls to push will still be faster than unshift because there is less accounting to do. But both will be order n rather than having n calls to unshift being order n^2 as it is today. | [reply] | ||||||||||||||||||||||||||||
by extremely (Priest) on Nov 20, 2000 at 02:13 UTC | |||||||||||||||||||||||||||||
-- | [reply] | ||||||||||||||||||||||||||||
|
Re: Sort this data
by princepawn (Parson) on Nov 20, 2000 at 04:48 UTC | |||||||||||||||||||||||||||||
| [reply] | ||||||||||||||||||||||||||||