Linked lists as arrays: inserting values

radiantmatrix has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Linked lists as arrays: inserting values by Fletch (Bishop) on Sep 25, 2006 at 15:26 UTC
splice can insert items in the middle, but yes that'll trigger the same inefficient copying and moving that your slice method incurs. If you're mucking with the middle of your list a lot then yes, you may have a case where a linked list will be more efficient than a native array.	[reply]
Re: Linked lists as arrays: inserting values by Tanktalus (Canon) on Sep 25, 2006 at 15:43 UTC
Well, it seems that even you quote says "almost always" - perhaps you've found one of the places that make it "almost always" rather than just "always". That said, as far as I'm aware, perl data is mostly a small struct of pointers, so copying them around is probably not that expensive - O(n) based on the number of items that need to be copied around instead of O(nm) where m is related to the length of the strings, or the contents of whatever they may refer to (hash refs, array refs, objects, etc.). So it may not really be that bad to use splice. The flip side is that by using perl arrays for your data instead of linked lists, perl handles all the details for you. Not that linked lists are necessarily hard or anything, but any time you introduce any type of complexity, you increase the possibility for bugs. By their nature, programs are complex, so we can't avoid that risk. However, we can avoid risk in areas with insignificant gains. That, of course, begs us to ask: what gains? And thus, I challenge you to benchmark it to prove that there are gains to be had with another method, and to prove that those gains are of significance in your application. My guess is that you'll need a package full of code to abstract the list away to keep the rest of your code simple. And that will eat away at significant portions of your speed gains. And then, if you ever want to hand your list to some standard function, you're going to have to convert it back to a list anyway, and there goes all the rest of your gains. That's just a guess, though. ;->	[reply]
Re^2: Linked lists as arrays: inserting values by tilly (Archbishop) on Sep 26, 2006 at 04:33 UTC
Absolutely the correct answer. Building a large data set by repeatedly splicing into the middle is indeed O(nn) while a linked list is O(n). But that is O(nn) with a small constant term versus O(n) with a big term. Unless your dataset is very large, the native array approach will be far faster. Just consider the cost of accessing the next element. With the native approach it will be a pointer lookup versus having to make a function call (and Perl function calls are slow). Furthermore a final reason not to use linked lists in Perl. Unless you are very careful, the linked lists will have circular data structures (each item points to the next which points to the previous). Therefore you are either in the business of having to do memory management yourself, or else you need to add yet another layer of slow indirection. Either way you've added more complexity, more room for bugs, and have reduced your potential performance gains even more.	[reply]
Re: Linked lists as arrays: inserting values by shmem (Chancellor) on Sep 25, 2006 at 16:27 UTC
Benchmarking your code (`insert_array_elem1`)against the same code using splice (`insert_array_elem2`), changing the line `@$ra = @$ra[0..$index-1], $elem, @$ra[$index..@$ra-1];` [download] with `splice(@$ra,$index,0,$elem);` [download] and using `cmpthese (5000, { radiantmatrix => sub { my $array = [1..1000]; insert_array_elem1($ +array,1,$_) for 50..100 }, use_splice => sub { my $array = [1..1000]; insert_array_elem2($arr +ay,1,$_) for 50..100 }, });` [download] results in `Rate radiantmatrix use_splice radiantmatrix 467/s -- -83% use_splice 2762/s 492% --` [download] radiantmatrix - `use_splice` :-) --shmem _($_=" "x(1<<5)."?\n".q·/)Oo. G°\ / /\_¯/(q / ---------------------------- \__(m.====·.(_("always off the crowd"))."· ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}	[reply] [d/l] [select]
Re^2: Linked lists as arrays: inserting values by Not_a_Number (Prior) on Sep 25, 2006 at 20:25 UTC
Update: Ignore this post (mistakes pointed out by shmem and ikegami below. Perhaps I'm doing something wrong, but my benchmarking results are radically different from shmem's. For a start, with the same data (programme run several times), I get something like this: `Rate splicing radiant splicing 2371/s -- -19% radiant 2936/s 24% --` [download] This is not the first time that I have got very different benchmarking results than other Monks on this forum, but this time the difference is particularly egregious. In case you're wondering: `C:\Perl\progs>perl -v This is perl, v5.8.8 built for MSWin32-x86-multi-thread <snip> Binary build 817 provided by ActiveState` [download] And the bigger the original array gets (and the greater the number of elements to insert), the more radiantmatrix's code appears to outperform `splice`: `C:\Perl\progs>scratchpad.pl 6000 7000 10000 Array size: 10000 Inserting: 6000 .. 7000 Rate splicing radiant splicing 28.1/s -- -86% radiant 198/s 603% --` [download] `C:\Perl\progs>scratchpad.pl 60000 61000 100000 Array size: 100000 Inserting: 60000 .. 61000 Rate splicing radiant splicing 2.87/s -- -93% radiant 42.7/s 1388% --` [download] Perhaps I've got something very very wrong, but my findings seem to be borne out by this extract from Mastering algorithms with Perl, Chapter 3: ...splicing elements into or out of the middle of a large array can be very expensive. Here's my benchmarking code, demolish it at will: Read more... (1017 Bytes)	[reply] [d/l] [select]
Re^3: Linked lists as arrays: inserting values by shmem (Chancellor) on Sep 25, 2006 at 23:01 UTC
Running your code on my Linux box, I get: `qwurx [shmem] ~> perl 574821.pl 6000 7000 10000 Array size: 10000 Inserting: 6000 .. 7000 Rate splicing radiant splicing 46.8/s -- -66% radiant 137/s 193% --` [download] You are inserting a number between 6000 and 7000 at index 1, in every call to insert1 and insert2. `# called as insert1( \@ary, 1, $_ ) for $START .. $END sub insert1 { my ( $ra, $index, $elem ) = @_; @$ra = @$ra[0 .. $index-1], $elem, @$ra[$index .. @$ra-1]; } # I tested with sub insert1 { my ( $ra, $elem, $index ) = @_; @$ra = @$ra[0 .. $index-1], $elem, @$ra[$index .. @$ra-1]; } # called as insert1( \@ary, 1, $_ ) for $START .. $END` [download] If I swap $elem and $index (i.e. insert 1 at an index from $START to $END) I get: `qwurx [shmem] ~> perl 574821.pl 600 700 1000 Array size: 1000 Inserting: 600 .. 700 Rate radiant splicing radiant 30.5/s -- -98% splicing 1672/s 5377% --` [download] The insert somewhere in the middle is more expensive, that's why `$_/10 for @params` here. My perl: `$ perl -v This is perl, v5.8.8 built for i586-linux-thread-multi ...` [download] --shmem _($_=" "x(1<<5)."?\n".q·/)Oo. G°\ / /\_¯/(q / ---------------------------- \__(m.====·.(_("always off the crowd"))."· ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}	[reply] [d/l] [select]
Re^3: Linked lists as arrays: inserting values by ikegami (Patriarch) on Sep 25, 2006 at 23:37 UTC
Here's my benchmarking code, demolish it at will: First rule of Benchmarking, make sure the code you are benchmarking actually works! `@$ra = @$ra[0 ..$ index-1], $elem, @$ra[$index .. @$ra-1];` means `(@$ra = @$ra[0 ..$ index-1]), $elem, @$ra[$index .. @$ra-1];` You want `@$ra = ( @$ra[0 ..$ index-1], $elem, @$ra[$index .. @$ra-1] );` Also, your arguments are backwards: `insert1( \@ary, 1, $_ ) for $START .. $END` `insert2( \@ary, 1 ,$_ ) for $START .. $END` should be `insert1( \@ary, $_, 1 ) for $START .. $END` `insert2( \@ary ,$_, 1 ) for $START .. $END` Once fixed (and setting the loop count to -3 cause it was taking forever): `>perl 574845.pl 500 600 1000 Array size: 1000 Inserting: 500 .. 600 Rate radiant splicing radiant 17.3/s -- -99% splicing 1470/s 8392% --` [download]	[reply] [d/l] [select]
Re: Linked lists as arrays: inserting values by VSarkiss (Monsignor) on Sep 25, 2006 at 16:08 UTC
An old but still very interesting discussion on this topic is rooted at Shift, Pop, Unshift and Push with Impunity!. Take a look. Do not rebuke them with harsh words ... but rather lead them gently - with URLs - so that they may learn wisdom.	[reply]
Re: Linked lists as arrays: inserting values by holli (Abbot) on Sep 25, 2006 at 15:54 UTC
I was pretty sure there was something for this in List::MoreUtils, but as it turned out there isn't. So I came up with this: `sub insert_array_elem { my ($ra, $elem, $index) = @_; # insert $elem before $ra->[$index] my $idx = 0; insert_after { $idx++ == $index-1; } $elem => @{$ra}; }` [download] This doesn't work for the edge cases (first and last element), but hey, that's what `pop` and friends are there for. Maybe you could write an email to the author of List::MoreUtils to provide an `insert_at_index` function?. holli, /regexed monk/	[reply] [d/l] [select]
Re^2: Linked lists as arrays: inserting values by radiantmatrix (Parson) on Sep 25, 2006 at 17:13 UTC
I had checked List::MoreUtils and noted it didn't do that, and it might be nice for completeness. But given the other replies here, I think splice is about perfect as-is. <–radiant.matrix–> A collection of thoughts and links from the minds of geeks The Code that can be seen is not the true Code I haven't found a problem yet that can't be solved by a well-placed trebuchet	[reply]
Re^3: Linked lists as arrays: inserting values by holli (Abbot) on Sep 26, 2006 at 07:00 UTC
Yup. I benchmarked my solution against the other two and it is way slower. But: It looks best ;-) holli, /regexed monk/	[reply] [d/l]
Re: Linked lists as arrays: inserting values by DentArthurDent (Monk) on Sep 25, 2006 at 15:49 UTC
If the insert appears to be greater than O(n log n) then perhaps keeping a hash of the data and generating the order with a sort of the keys list might be faster... Just a thought.. ---- My mission: To boldy split infinitives that have never been split before!	[reply]
Re: Linked lists as arrays: inserting values by GrandFather (Saint) on Sep 25, 2006 at 18:51 UTC
The thread To push and pop or not to push and pop? and the threads it references may be of further interest for those interested in how Perl manages arrays. DWIM is Perl's answer to Gödel	[reply]
Re: Linked lists as arrays: inserting values by jdporter (Paladin) on Sep 25, 2006 at 21:50 UTC
Does your application actually require random-access insertion at any point in the array? If not, there are some optimization techniques you can try... We're building the house of the future together.	[reply]
Re^2: Linked lists as arrays: inserting values by radiantmatrix (Parson) on Sep 25, 2006 at 22:58 UTC
Pretty close... I'm a little leery of premature optimization, I just wanted to avoid doing something stupid. <–radiant.matrix–> A collection of thoughts and links from the minds of geeks The Code that can be seen is not the true Code I haven't found a problem yet that can't be solved by a well-placed trebuchet	[reply]
Re^3: Linked lists as arrays: inserting values by jdporter (Paladin) on Sep 26, 2006 at 01:44 UTC
I should have said heuristics, rather than optimizations. For example, if the point of the next insertion is usually "very close" to the previous insertion, you can make a significant improvement in the performance (e.g. O(n) vs O(n log n)). At any rate, choosing a O(n) algorithm up front instead of a O(n log n) algorithm shouldn't be dismissed as "premature". It could be, in fact, a well-timed optimization of your development process. :-) We're building the house of the future together.	[reply]