Data structure challenge

Replies are listed 'Best First'.
Re: Data structure challenge by flyingmoose (Priest) on Mar 17, 2004 at 18:26 UTC
Seems interesting, but you kind of lost me. I'm easy to lose sometimes, so don't take this as a knock on your post. IMHO, this would be more understandable if you used more terms found in the data structure references. Initially, your datastructure looks like it's just pigeon-holing data (unrelated, but see bucket sort), but your example of the 'special' structure is a little confusing. Terms like stack, queue, dequeue, trie, red-black-tree, list, etc -- are all there to facilitate easier communication. Anyhow, as it stands, you've lost me. Does this 'special structure' have a name? What are you trying to emulate? I can sort of grok the intent, but an explanation of the workings and principle is probably in order. As it stands, all classical operations being O(1) is only doable when your datastructure is as large as the domain space, right? In many cases, this means working on small data sets or with huge amounts of data. Result? It's not used very frequently. I understand this is just a question to see if it's possible, but where would this be used and is there a space to fit it? If one datastructure fit everywhere, we wouldn't have the cool montage of data-structures, of course, so I guess I've made your point :) Also, the cost of a hash function that maps to a completely unique value (as noted in my previous paragraph) is expensive both in time and space. Maybe Perl is cool and doesn't make an array as wide as it is and can leave them sparse...but can it really? Not sure. Hence this is why hashes collide and tend to be a little slow -- but the hash function is less expensive since collisions are allowed (not speaking Perl, but stuff in general) -- and collapsing down to a smaller value is important.	[reply]
Re: Data structure challenge by Abigail-II (Bishop) on Mar 17, 2004 at 20:49 UTC
Initially, your datastructure looks like it's just pigeon-holing data (unrelated, but see bucket sort), but your example of the 'special' structure is a little confusing. Terms like stack, queue, dequeue, trie, red-black-tree, list, etc -- are all there to facilitate easier communication. The datastructure I describe consists of two arrays, just as I said. No stack, no queue, no dequeue, no trie, no tree, no list. Just an array on integers, that is, `U` integers equidistant apart. Just the same way you have an array of integers in C. The trick is just that you say my array lies from here to here in memory, and you initially accept whatever values happen to be in that segment of memory. (Although you might use a stack for array B). Does this 'special structure' have a name? That I am not sure of. The technique is described in Kurt Mehlhorn: "Data structures and algorithms 1: sorting and searching" in Brauer, Rozenberg and Salomaa (Ed): EATCS monographs on theoretical computer science. Berlin, Heidelberg, New York, Tokyo: Springer-Verlag, 1984. Section III.8.1 where it is used to implement a bit vector. What are you trying to emulate? A bit vector ;-) As it stands, all classical operations being O(1) is only doable when your datastructure is as large as the domain space, right? Yes, and I had hoped that was clear from the description. It's not used very frequently. I understand this is just a question to see if it's possible, but where would this be used and is there a space to fit it? I've used it once in an academic paper, describing a technique to create a triangulation on a 2-manifold, by applying a sequence of vertex-splits starting from a minimal triangulation. The main result used `O (n log n)` time (with `n` the number of vertices of the triangulation), but by using `n` of the described datastructures, the sequence can be obtained in linear time (as long as you have `Θ (n²)` scratch space available). I have no pressing need to implement this in Perl right now. But I would like to know whether it's possible to do something similar in Perl. Abigail	[reply]
Re: Re: Data structure challenge by flyingmoose (Priest) on Mar 17, 2004 at 21:23 UTC
Thanks for typing this -- this makes it a lot more clear what is going on. Interesting stuff... Again I apologize for my long list of questions too... Infinite telecoms and project managers will do that to you. I'll take a better look at this when I get home...	[reply]
Re: Data structure challenge by gjb (Vicar) on Mar 17, 2004 at 18:11 UTC
Trouble is caused by initialisation, so what data structure to use that is initialized in constant time? The answer may be to use simple strings for A and B. The length should be U*log_10(U) so that each group of log_10(U) characters represents an "array element". Initialisation can be done using `pack` that should be constant time, array access can be simulated using `substr` which is also constant time. Just my 2 cents, might be totally wrong, -gjb-	[reply] [d/l] [select]
Re: Data structure challenge by Abigail-II (Bishop) on Mar 17, 2004 at 20:58 UTC
Initialisation can be done using pack that should be constant time Well, that would be interesting. I'm not a pack() wizard, and I can't see how I would do it with pack. What format would I use? Abigail	[reply]
Re: Re: Data structure challenge by gjb (Vicar) on Mar 17, 2004 at 22:13 UTC
Wouldn't a ~~`pack(a[Ulog_10(U)], 'z')`~~ do the trick? Thanks Aristotle, I was too lazy/huried to check the correct syntax. The length of the string we want to allocate is N = Uceil(log_10(U)), so I hoped `pack(aN, '')` would create a string of N '\0' characters using some low level memory allocation function. Unfortunately that doesn't seem to happen, the resulting string consists of N spaces, so there goes constant time... :( Just my 2 cents, -gjb-	[reply] [d/l] [select]
Re^3: Data structure challenge by Aristotle (Chancellor) on Mar 18, 2004 at 18:08 UTC
Re: Data structure challenge by BrowserUk (Patriarch) on Mar 17, 2004 at 18:36 UTC
Something like this? Update: Out by one error corrected. Read more... (2 kB) Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l]
Re^2: Data structure challenge (amortized) by tye (Sage) on Mar 17, 2004 at 19:44 UTC
If amortizing the initialization cost over the lifetime of the structure were allowed, then simple Perl arrays could be used (by not preallocating the size of the array so that the initial creation is still O(1)). Your use of strings 'suffer' from this same 'problem'. So I'm pretty sure Abigail would reject this solution. Your solution would qualify if you could preallocate the length of the string w/o Perl initializing the contents of the string. This is pretty trivial to do in XS, hence the disqualification of that method. - tye	[reply]
Re: Re^2: Data structure challenge (amortized) by dpuu (Chaplain) on Mar 17, 2004 at 21:27 UTC
I'm not allowed to amortize the initialization over the life of the structure: but what about over the life of the program: is it valid (in the context of this challenge) big array (vec, whatever) at time zero and then manage it using my own allocator?	[reply]
Re^4: Data structure challenge (amortized) by sleepingsquirrel (Chaplain) on Mar 18, 2004 at 00:01 UTC
Re: Data structure challenge (amortized) by Abigail-II (Bishop) on Mar 18, 2004 at 00:15 UTC
Re: Data structure challenge by Abigail-II (Bishop) on Mar 17, 2004 at 20:14 UTC
Just as tye says, the problem is in the initialization. If you assign a value in a vec() string, and it's "out of bounds", perl will zero all the in between values. That means, that if you were to store just one value, say `U - 1`, that insertion will cost you `Θ (U)` time. Which isn't `O (1)`. Abigail	[reply]
Re: Re: Data structure challenge by BrowserUk (Patriarch) on Mar 17, 2004 at 20:50 UTC
I thought it was too easy. So the whole challenge comes down to is there any way to allocate/extend a piece of memory without initialising it, using pure perl? Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply]
Re: Data structure challenge by BrowserUk (Patriarch) on Mar 17, 2004 at 21:53 UTC
Are you getting at `$#array = U;` ? `use Benchmark::Timer; $T = new Benchmark::Timer; $T->start( 'autoviv' ); $auto[ $_ ] = 1 for map{ 10$_ } 0 .. 7; $T->stop( 'autoviv' ); $#pre = 107; $T->start( 'prealloc' ); $pre[ $_ ] = 1 for map{ 10$_ } 0 .. 7; $T->stop( 'prealloc' ); $T->report; 1 trial of autoviv (93.750ms total) 1 trial of prealloc (15.625ms total)` [download] That still doesn't acheive O(1) as the number of links in the list is still N, though it is somewhat quicker than autoviv. That said, when initialising 16 MB of scalar to spaces only takes 36 milliseconds, this is one of those cases where big O notation hides reality I think. `use Benchmark::Timer; $T = new Benchmark::Timer; $T->start( 'scalar'), $x = ' ' x 224, $T->stop( 'scalar' ) for 1 .. +10; $T->report; 10 trials of scalar (359.375ms total), 35.938ms/trial` [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l] [select]
Re: Data structure challenge by BrowserUk (Patriarch) on Mar 18, 2004 at 14:36 UTC
There is an out-by-one error in the logic somewhere that I cannot find right now, but otherwise, I think this complies with the rules of the challenge. #! perl -slw use strict; local $, = ' '; my( $A, $C, @B ) = ( undef, 0 ); open SPARSE, '+> :raw', \$A; sub readA{ seek SPARSE, 4 * $_[ 0 ], 0; sysread SPARSE, my $n, 4; unpack 'N', $n; } sub writeA{ seek SPARSE, 4 * $_[ 0 ], 0; syswrite SPARSE, pack( 'N', $_[ 1 ] ), 4; } sub query{ my $i = shift; my $a = readA( $i ) \|\| -1; return 0 unless 0 <= $a and $a < $C; my $b = $B[ $a ]; return $b == $i ? 1 : 0; } sub insert{ my $i = shift; if( !query( $i ) ) { $B[ $C ] = $i; writeA( $i, $C ); $C++ } } sub del{ my $i = shift; if( query( $i ) ) { my $a = readA( $i ); my $b = $B[ --$C ]; $B[ $a ] = $b; writeA( $b, $a ); } } sub clear{ $C = 0; } my @numbers = map{ int rand( 2**16 ) } 1 .. 10; print 'Query before insert'; printf "%5d : %s\n", $_, query( $_ ) for @numbers; insert $_ for @numbers; print "\nQuery after insert"; printf "%5d : %s\n", $_, query( $_ ) for @numbers; del $_ for @numbers[ 0..4 ]; ## Delete half print "\nQuery after deleting half"; printf "%5d : %s\n", $_, query( $_ ) for @numbers; del $_ for @numbers; ## Delete them all regardless if they still exist +s print "\nQuery after deleting the rest"; printf "%5d : %s\n", $_, query( $_ ) for @numbers; [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l]
Re: Re: Data structure challenge by ambrus (Abbot) on Mar 18, 2004 at 15:57 UTC
Update: this is stupid, see BrowserUk's reply on why. I would not think this is constant time. I know nothing about how spares files are implemented, but I would not think it is such that you can access any byte in constant time. (If the filesystem does not have sparse files, the situation is even worse, as the os must fill the file with zero bytes when seeking past end.)	[reply]
Re: Re: Re: Data structure challenge by BrowserUk (Patriarch) on Mar 18, 2004 at 18:46 UTC
Okay. Forget the bareword SPARSE, it is just a filehandle, opened (R/W) to an in memory file per perlfunc:open (5.8.x): File handles can be opened to ``in memory'' files held in Perl scalars via: `open($fh, '>', \$variable) \|\| ..` [download] My inspiration was that if you seek on an in-memory file, the string is extended, without being initialised. Of course it isn't if it is going to emulate a file. If you seek to a position in a random access file you don't want everything preceding it to be overwritten. So, a pure-perl way of allocating a buffer without initialisation. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l] [select]
Re: Re: Re: Re: Data structure challenge by theorbtwo (Prior) on Mar 21, 2004 at 11:21 UTC
Re: Re: Re: Re: Re: Data structure challenge by BrowserUk (Patriarch) on Mar 21, 2004 at 11:59 UTC
Some notes below your chosen depth have not been shown here
Re: Data structure challenge by dragonchild (Archbishop) on Mar 17, 2004 at 21:22 UTC
What about using soft references? I used BrowserUk's harness and changed the engine around a little. Read more... (2 kB) Now, I think that doing this is cheating, a little, because I'm using the symbol table, which is a hash. But, I don't think this is possible in pure Perl, not with the constraints you've placed on it. ------ We are the carpenters and bricklayers of the Information Age. Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.	[reply] [d/l]
Re: Re: Data structure challenge by BrowserUk (Patriarch) on Mar 17, 2004 at 22:23 UTC
I came up with something similar using a hash to emulate a sparse array Read more... (2 kB) But then realised that I might as well just use the hash to start with and save all the bother. Besides, the hash scalaes better. The overhead of allocating 4GB of ram in order to allocates 3 unique 32-bit integers (for example) is far higher than hashing those 3 integers. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l]
Re: Data structure challenge by toma (Vicar) on Mar 18, 2004 at 06:26 UTC
I am not offering this as a solution to your challenge, but I suggest looking at the thread Compressing a set of integers. I think that I had a similar problem, but I had a few more opportunities for optimization than you have presented. With help from the others in the thread, I was able to use pack to make a really cool data structure that is similar to a sparse bit vector. My problem had the potential optimization that some bits were more likely than others, and I could reorder. For the problem I was trying to solve, Table::ParentChild ended up getting written along the way. For the deployed application, I ended up using a relational database, which performs well enough today in production. I also suggest, for practical solutions, that you consider algorithms which are O(log(n)), especially those with small constant factors. It should work perfectly the first time! - toma	[reply]
Re: Data structure challenge by iburrell (Chaplain) on Mar 17, 2004 at 19:33 UTC
With Perl, you can make initializing and clearing constant by using autovivication with arrays. Allocate the @A as an empty array. Setting any value will expand the array and set the other values to undef. This is O(N) but it is fast. Clearing is emptying out the entire array. It also has the advantage that the size of the array is the largest value used. Your data structure uses autovivication for the A array so it has the same performance but more overhead. It is also important to realize the importance of the constants to running time. Preallocating the array pays O(U) cost in initialization but makes each operation more efficient. When inserting N items close to U, this can be significant.	[reply]
Re: Data structure challenge by Abigail-II (Bishop) on Mar 17, 2004 at 20:21 UTC
Setting any value will expand the array and set the other values to undef. Well, yes, but that exactly what the challange wants you to avoid. You don't want to spend time creating undefined values. Your data structure uses autovivication for the A array so it has the same performance but more overhead. No. It's important to realize I did not describe the A array in Perl terms. Think C, or even better, assembler. Just declare an array as a chunk of memory where integers are stored. No initialization of the elements allowed - just accept whatever values are in memory. Abigail	[reply]
Re: Data structure challenge by ambrus (Abbot) on Mar 18, 2004 at 07:59 UTC
This is an interesting question. If I read you correctly, it is like this: you could get an only slightly slower solution by using a hash, or better, a binary tree instead of the uninitialized array A (with a binary tree, you get O(log n) instead of O(1) for each operation). In fact, this is true no matter how you use an uninitialized array, that is, you can always use a hash or a binary tree instead of one. Notice however, that if you use such a data structure only once, and you use the efficent C implementation, the OS still has to initialize the array when the process gets the memory by calling brk or mmap.	[reply]
Re: Data structure challenge by Abigail-II (Bishop) on Mar 18, 2004 at 12:54 UTC
with a binary tree, you get O(log n) instead of O(1) for each operation Well, only if the binary tree is balanced (and ordered). I am very well aware of the performance levels of a binary tree - if I wanted to settle for O (log n) performance, I wouldn't have posted the challenge. The challenge is to do all operations in O (1) - doing one or more in O (log n) is not a challenge. Notice however, that if you use such a data structure only once, and you use the efficent C implementation, the OS still has to initialize the array when the process gets the memory by calling brk or mmap. While some C libraries might initialize the memory claimed with mmap or brk, there's no reason they have to. The OS and/or the C libraries can always be patched that one can claim a segment of memory, and taking it "as is". For the scope of the challenge, you may assume that we run on a platform where memory is given out "as is". Abigail	[reply]
Re: Re: Data structure challenge by ambrus (Abbot) on Mar 18, 2004 at 14:15 UTC
When saying binary tree, I thought of a trie actually. I know it's still O(log n) so does not satisfy the conditions, but a binary trie's code is not as complicated as that of a balanced binary tree. As for initializing memory: the C library certainly won't initialize the whole array when calling malloc. However, on multi-user OS, the system has to initialize all the pages allocated by a process, for security reasons. If it would not do that, you could get vulnerable informations by allocating much memory and searching for daat that other processes freed but hav not zeroed. This does not take much time because 1. malloc tries to reuse the memory freed by a process, so it doesn't have to ask for a new chunk evry time, 2. the os can do it in idle time, so you don't have to wait for it when you allocate the memory, 3. zeroing a chunk of memory is fast. The first reason may not count with very large arrays, as malloc gets thoes with mmap, and when they're freed they're munmapped. The question you asked still holds however, because using such a structure is good even in such OS'es, as you may use it many time, and you don't have to zero it evry time the same process clears and reuses one. I still don't know if it can be done in Perl.	[reply]
Re: Re: Data structure challenge by dragonchild (Archbishop) on Mar 18, 2004 at 12:30 UTC
No, the OS doesn't necessarily have to initialize the array. Many compilers will do that, but all malloc() should be doing is saying "I will be using this space for a while". It's not guaranteed at all what will be there. ------ We are the carpenters and bricklayers of the Information Age. Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.	[reply]
Re: Data structure challenge by Anonymous Monk on Mar 17, 2004 at 20:50 UTC
Is this not impossible using the data structures available to us within Perl? If we do not initialize before using a structure, then perl will autovivify, which is nearly just as bad. So to make this work with any kind of array system you would need to bypass perl's autovivification methods. Of course, you could skip any type of complicated data structure and stick with manipulating scalars. Fixed-length fields perhaps? Or somthing silly like that ;)	[reply]
Re: Data structure challenge by dragonchild (Archbishop) on Mar 18, 2004 at 13:31 UTC
I've been thinking about this. Obviously, this is do-able in C (and similar languages), as you have C-like pseudo-code in your challenge. But, other than sparse arrays¹, what would be practical benefit of this data structure be? If you add in a hashing function to map strings to your set of i, you still have the collision issue. Are you considering this as a possible implementation of the arrays Perl's hashes use? It would seem kinda wasteful to add an indexing array just to avoid clearing and initializing in O(n) ... ¹ I say sparse arrays because in the 2-array method, U doesn't have to be the upper limit of all i. It simply has to be the upper limit of how many i you can have at one time. So, if you know you'll only ever have a max of 10 things, they could be in the set of 2*32, but you would only need 2 10-element arrays and a short as your C. ------ We are the carpenters and bricklayers of the Information Age.* Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.	[reply]
Re: Data structure challenge by Abigail-II (Bishop) on Mar 18, 2004 at 14:24 UTC
But, other than sparse arrays1, what would be practical benefit of this data structure be? Worst-case constant time operations, regardless of how many elements there are in the data structure. If you add in a hashing function to map strings to your set of i, you still have the collision issue. Well, yes. Don't do that then. ;-) As I said, it only 'works' if you know certain things about your set - in this case, you know you are only going to store non-negative integers not exceeding `U`. Are you considering this as a possible implementation of the arrays Perl's hashes use? No. I say sparse arrays because in the 2-array method, U doesn't have to be the upper limit of all i. It simply has to be the upper limit of how many i you can have at one time. So, if you know you'll only ever have a max of 10 things, they could be in the set of 2**32, but you would only need 2 10-element arrays and a short as your C. In the 2-array method I described, it's essential `U` is the upper limit of all `i`, because `i` is used as an index in the array `A`. Abigail	[reply]
Re: Data structure challenge by bsb (Priest) on Mar 22, 2004 at 23:34 UTC
Does $#A = $#B = U initialize the arrays correctly? Brad	[reply]
Re: Data structure challenge by Abigail-II (Bishop) on Mar 22, 2004 at 23:50 UTC
Yes, Perl has to initialize all array elements. Internally, it will store pointers (to SV's), and if Perl wouldn't initialize the array elements, it ends up with random memory addresses, and it has no way of finding out whether the element was initialized or not. Abigail	[reply]
Re: Re: Data structure challenge by bsb (Priest) on Mar 23, 2004 at 04:37 UTC
As I thought about it, I began to suspect that. XS of course doesn't count. Does this cover virtually programming in C via syscall(&SYS_mmap,..) and pack-ing data structures? Otherwise, I give up. In fact, I give up anyway, I'm not going near syscall.	[reply]

Introduction

A special datastructure

Challenge