Extrapolating the example from the OP in a 1.5 GB File and randomly skippping 1 out of 4 lines it seems you'll miss the searched line in average for more than 1200 entries before you start searching linearly.
I stopped after 2 test runs since my laptop with 1 GB RAM was heavily swapping for 30 minutes then* ... I'll post the completed results tomorrow morning. : )
Temporary conclusion, over 1200 seeks doesn't even look close to O(1) for me, could it be your testdata was a little optimistic? Hope you understand my critic now.
Please Note: This is the very optmistic best case for only a 25% gap probability. I didn't even try to insert any worsening conditions now!
That much about the normal average case.
Cheers Rolf
CODE:
use strict; use warnings; $\="\n"; $_=""; #--- Settings my $debug=0; my $runs=5; my $gap_probability=.25; my $min=0; my $max=35_000_000; #--- my $lines_per_skip=1/$gap_probability; my $miss_avg_runs; for my $run (1..$runs){ print "\n","="x10 ." run:".$run,"\n"; my (@data,%gap,%miss); my $entry=0; my $last; my $x; for my $n ($min..$max) { if ( ($x=rand $lines_per_skip) > 1 ) { print "$entry:$n ($x)" if($debug); $data[$entry++]=$n; $gap{$n-$last-1}++ if defined $last; $last=$n; } } my $factor=($data[-1]-$data[0])/$#data; print " gap_probability: $gap_probability \n interpolation_factor: + $factor "; $miss{ round( $_ * $factor + $data[0] - $data[$_] ) } ++ for (0. +.$#data) ; print " Entries: $entry"; print " =Data @data" if ($entry <20); print_histogram(\%gap,"Gaps"); print_histogram(\%miss,"Misses"); my $miss_sum; $miss_sum += abs($_) * $miss{$_} for (keys %miss ); my $miss_avg = $miss_sum / $entry; print "miss_avg:$miss_avg"; $miss_avg_runs+=$miss_avg; print "miss_avg/runs:",$miss_avg_runs/$run; } sub round { my $x = $_[0]; my $d= $x<0 ? -0.5 : 0.5; return int($x+$d) ; my $test=sub { for (-3 .. 3) { my $x= $_/5 ; print $x, round($x); } } } sub print_histogram{ my ($href,$title)=@_; local ($\,$,); my $partition=8; my @keys=sort {$a <=> $b} keys %$href ; my $range=$keys[-1]-$keys[0]; my $interval=$range/$partition; my $delimiter= $keys[-1] - $interval*($partition); my $partstart;#=$keys[0]; print "\n-$title: "; if ( $interval < 1) { print "$_: $href->{$_} " for (@keys ); print "\n"; return; } print " for $partition Intervals\n"; my $sum; for my $key (@keys) { my $value=$href->{$key}; $sum+=$value; if ($key>=$delimiter) { print "],$key]: $sum "; $partstart="";#$key; $delimiter+=$interval; $sum=0; } } print "\n"; return; }
OUTPUT:
Compilation started at Fri Jun 12 13:48:34 /usr/bin/perl -w /home/lanx/gauss.pl ========== run:1 gap_probability: 0.25 interpolation_factor: 1.33330078809507 Entries: 26250641 -Gaps: for 8 Intervals ],0]: 19686681 ],2]: 6154875 ],4]: 383422 ],5]: 19281 ],7]: 5975 +],9]: 386 ],10]: 14 ],13]: 6 -Misses: for 8 Intervals ],-2700]: 10 ],-1987]: 1569713 ],-1274]: 5017419 ],-561]: 2169821 +],152]: 4382814 ],865]: 4367173 ],1578]: 3316195 ],2291]: 3248721 + ],3003]: 2178775 miss_avg:1232.34795784027 miss_avg/runs:1232.34795784027 ========== run:2 gap_probability: 0.25 interpolation_factor: 1.33346175204987 Entries: 26247473 -Gaps: for 8 Intervals ],0]: 19683096 ],2]: 6153987 ],4]: 384541 ],5]: 19528 ],7]: 5925 +],9]: 371 ],10]: 17 ],12]: 5 ],13]: 2 -Misses: for 8 Intervals ],-4136]: 10 ],-3439]: 1459001 ],-2743]: 2947745 ],-2047]: 3788571 + ],-1351]: 996276 ],-655]: 3238221 ],41]: 7697843 ],737]: 4381294 + ],1433]: 1738512 miss_avg:1286.98638348867 miss_avg/runs:1259.66717066447 Compilation interrupt at Fri Jun 12 14:22:47
In reply to Re^2: Rapid text searches (complexity test)
by LanX
in thread Rapid text searches
by joomanji
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |