Re: Linked lists as arrays: inserting values
by Fletch (Bishop) on Sep 25, 2006 at 15:26 UTC
|
splice can insert items in the middle, but yes that'll trigger the same inefficient copying and moving that your slice method incurs. If you're mucking with the middle of your list a lot then yes, you may have a case where a linked list will be more efficient than a native array.
| [reply] |
Re: Linked lists as arrays: inserting values
by Tanktalus (Canon) on Sep 25, 2006 at 15:43 UTC
|
Well, it seems that even you quote says "almost always" - perhaps you've found one of the places that make it "almost always" rather than just "always".
That said, as far as I'm aware, perl data is mostly a small struct of pointers, so copying them around is probably not that expensive - O(n) based on the number of items that need to be copied around instead of O(nm) where m is related to the length of the strings, or the contents of whatever they may refer to (hash refs, array refs, objects, etc.). So it may not really be that bad to use splice.
The flip side is that by using perl arrays for your data instead of linked lists, perl handles all the details for you. Not that linked lists are necessarily hard or anything, but any time you introduce any type of complexity, you increase the possibility for bugs. By their nature, programs are complex, so we can't avoid that risk. However, we can avoid risk in areas with insignificant gains.
That, of course, begs us to ask: what gains? And thus, I challenge you to benchmark it to prove that there are gains to be had with another method, and to prove that those gains are of significance in your application.
My guess is that you'll need a package full of code to abstract the list away to keep the rest of your code simple. And that will eat away at significant portions of your speed gains. And then, if you ever want to hand your list to some standard function, you're going to have to convert it back to a list anyway, and there goes all the rest of your gains.
That's just a guess, though. ;->
| [reply] |
|
Absolutely the correct answer.
Building a large data set by repeatedly splicing into the middle is indeed O(n*n) while a linked list is O(n). But that is O(n*n) with a small constant term versus O(n) with a big term. Unless your dataset is very large, the native array approach will be far faster. Just consider the cost of accessing the next element. With the native approach it will be a pointer lookup versus having to make a function call (and Perl function calls are slow).
Furthermore a final reason not to use linked lists in Perl. Unless you are very careful, the linked lists will have circular data structures (each item points to the next which points to the previous). Therefore you are either in the business of having to do memory management yourself, or else you need to add yet another layer of slow indirection. Either way you've added more complexity, more room for bugs, and have reduced your potential performance gains even more.
| [reply] |
Re: Linked lists as arrays: inserting values
by shmem (Chancellor) on Sep 25, 2006 at 16:27 UTC
|
Benchmarking your code (insert_array_elem1)against the same code using splice (insert_array_elem2), changing the line
@$ra = @$ra[0..$index-1], $elem, @$ra[$index..@$ra-1];
with
splice(@$ra,$index,0,$elem);
and using
cmpthese (5000, {
radiantmatrix => sub { my $array = [1..1000]; insert_array_elem1($
+array,1,$_) for 50..100 },
use_splice => sub { my $array = [1..1000]; insert_array_elem2($arr
+ay,1,$_) for 50..100 },
});
results in
Rate radiantmatrix use_splice
radiantmatrix 467/s -- -83%
use_splice 2762/s 492% --
radiantmatrix - use_splice :-)
--shmem
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
| [reply] [d/l] [select] |
|
Update: Ignore this post (mistakes pointed out by shmem and ikegami below.
Perhaps I'm doing something wrong, but my benchmarking results are radically different from shmem's.
For a start, with the same data (programme run several times), I get something like this:
Rate splicing radiant
splicing 2371/s -- -19%
radiant 2936/s 24% --
This is not the first time that I have got very different benchmarking results than other Monks on this forum, but this time the difference is particularly egregious.
In case you're wondering:
C:\Perl\progs>perl -v
This is perl, v5.8.8 built for MSWin32-x86-multi-thread
<snip>
Binary build 817 provided by ActiveState
And the bigger the original array gets (and the greater the number of elements to insert), the more radiantmatrix's code appears to outperform splice:
C:\Perl\progs>scratchpad.pl 6000 7000 10000
Array size: 10000
Inserting: 6000 .. 7000
Rate splicing radiant
splicing 28.1/s -- -86%
radiant 198/s 603% --
C:\Perl\progs>scratchpad.pl 60000 61000 100000
Array size: 100000
Inserting: 60000 .. 61000
Rate splicing radiant
splicing 2.87/s -- -93%
radiant 42.7/s 1388% --
Perhaps I've got something very very wrong, but my findings seem to be borne out by this extract from Mastering algorithms with Perl, Chapter 3:
...splicing elements into or out of the middle of a large array can be very expensive.
Here's my benchmarking code, demolish it at will:
| [reply] [d/l] [select] |
|
Running your code on my Linux box, I get:
qwurx [shmem] ~> perl 574821.pl 6000 7000 10000
Array size: 10000
Inserting: 6000 .. 7000
Rate splicing radiant
splicing 46.8/s -- -66%
radiant 137/s 193% --
You are inserting a number between 6000 and 7000 at index 1, in every call to insert1 and insert2.
# called as insert1( \@ary, 1, $_ ) for $START .. $END
sub insert1 {
my ( $ra, $index, $elem ) = @_;
@$ra = @$ra[0 .. $index-1], $elem, @$ra[$index .. @$ra-1];
}
# I tested with
sub insert1 {
my ( $ra, $elem, $index ) = @_;
@$ra = @$ra[0 .. $index-1], $elem, @$ra[$index .. @$ra-1];
}
# called as insert1( \@ary, 1, $_ ) for $START .. $END
If I swap $elem and $index (i.e. insert 1 at an index from $START to $END) I get:
qwurx [shmem] ~> perl 574821.pl 600 700 1000
Array size: 1000
Inserting: 600 .. 700
Rate radiant splicing
radiant 30.5/s -- -98%
splicing 1672/s 5377% --
The insert somewhere in the middle is more expensive, that's why $_/10 for @params here.
My perl:
$ perl -v
This is perl, v5.8.8 built for i586-linux-thread-multi
...
--shmem
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
| [reply] [d/l] [select] |
|
Here's my benchmarking code, demolish it at will:
First rule of Benchmarking, make sure the code you are benchmarking actually works!
@$ra = @$ra[0 ..$ index-1], $elem, @$ra[$index .. @$ra-1];
means
(@$ra = @$ra[0 ..$ index-1]), $elem, @$ra[$index .. @$ra-1];
You want
@$ra = ( @$ra[0 ..$ index-1], $elem, @$ra[$index .. @$ra-1] );
Also, your arguments are backwards:
insert1( \@ary, 1, $_ ) for $START .. $END
insert2( \@ary, 1 ,$_ ) for $START .. $END
should be
insert1( \@ary, $_, 1 ) for $START .. $END
insert2( \@ary ,$_, 1 ) for $START .. $END
Once fixed (and setting the loop count to -3 cause it was taking forever):
>perl 574845.pl 500 600 1000
Array size: 1000
Inserting: 500 .. 600
Rate radiant splicing
radiant 17.3/s -- -99%
splicing 1470/s 8392% --
| [reply] [d/l] [select] |
Re: Linked lists as arrays: inserting values
by VSarkiss (Monsignor) on Sep 25, 2006 at 16:08 UTC
|
| [reply] |
Re: Linked lists as arrays: inserting values
by holli (Abbot) on Sep 25, 2006 at 15:54 UTC
|
I was pretty sure there was something for this in List::MoreUtils, but as it turned out there isn't. So I came up with this:
sub insert_array_elem
{
my ($ra, $elem, $index) = @_; # insert $elem before $ra->[$index]
my $idx = 0;
insert_after { $idx++ == $index-1; } $elem => @{$ra};
}
This doesn't work for the edge cases (first and last element), but hey, that's what pop and friends are there for. Maybe you could write an email to the author of List::MoreUtils to provide an insert_at_index function?.
| [reply] [d/l] [select] |
|
| [reply] |
|
Yup. I benchmarked my solution against the other two and it is way slower. But: It looks best ;-)
| [reply] [d/l] |
Re: Linked lists as arrays: inserting values
by DentArthurDent (Monk) on Sep 25, 2006 at 15:49 UTC
|
| [reply] |
Re: Linked lists as arrays: inserting values
by GrandFather (Saint) on Sep 25, 2006 at 18:51 UTC
|
| [reply] |
Re: Linked lists as arrays: inserting values
by jdporter (Paladin) on Sep 25, 2006 at 21:50 UTC
|
Does your application actually require random-access insertion at any point in the array? If not, there are some optimization techniques you can try...
We're building the house of the future together.
| [reply] |
|
| [reply] |
|
I should have said heuristics, rather than optimizations. For example, if the point of the next insertion is usually "very close" to the previous insertion, you can make a significant improvement in the performance (e.g. O(n) vs O(n log n)). At any rate, choosing a O(n) algorithm up front instead of a O(n log n) algorithm shouldn't be dismissed as "premature". It could be, in fact, a well-timed optimization of your development process. :-)
We're building the house of the future together.
| [reply] |