Re: Re: Algorithms, Datastructures and syntax

What a bunch of balony.

I'll come back to this.

And how do you think Perl allows you to do a foreach?

- The point it, I don't need to know.

Just because you don't see the bookkeeping at a language level doesn't mean it doesn't exist.

- Of course it doesn't. What its absence at the language level does mean is that as a perl programmer I don't have to think about it.

Besides, with a foreach loop, you get an additional variable as well - the iterator. Even if there's none mentioned in your program source, you still get a new $_.

Yes, but that additional variable is managed by perl, not by me. I don't have to name it--though I can--but more importantly, I don't have to manage it. I don't need to use sizeof to determine how much to increment it by; or dereference it to read or write to the array element. I don't have to concern myself with the type of the data the element of the array $_ is aliasing is, I just go about my business of using it and allow perl to take care of the process of managing the array.

You say the number of iterations can be effected by changing the value of the iterator (I don't see how you can change the stop condition, C doesn't have an eval, and even Perl doesn't allow you to change code that was already compiled), and call that a downside.

In C, I can increment or decrement or zero the iterator and therebye change the number of iterations of the array. Used deliberately, this can be useful, but done accidently and you get code that runs of the end of the array and corrupts other data. Show me a case of an algorithm that uses the ability to modify the iterator within the body of the loop to good effect, and (in most cases), it will be possible to re-code the algorithm to not require modifying the iterator and will result in more reliability and more transparent code.

With regard to changing the stop condition: often this is coded in terms of for (int i=0; i < somevar; i++) .... If somevar is modified, the stop condition changes. In perl, the stop condition is defined by the size of the list. This cannot be accidentally modified within the loop

$n=10;
for (1..$n) {
    $n=0 if $n==5;
    print $_, $/;
}
[download]

Even though the list is generated at runtime using lazy evaluation, attempts to modify that list within the bounds of the loop fail. The above loops prints 1 through 10 despite the value of $n changing. I agree that this reduces the flexibility of the loop, but if the algorithm necessitates the change, then this can be explicitly written (eg. last if $n = 5), rather than modifying the stop condition or the iterator. Whilst this can also be done (using break;) in C, it isn't the explicit modification that catches people out, it's the accidental modification that causes problems. This doesn't happen in perl.

I would call that a feature, making a C-style for more flexible than a Perl-style foreach.

Agreed it's more flexible, but it is also more error prone.

For instance, try to translate:

for (my $i = 0; $i < @array; $i += 2) { ... }

to a foreach style. It's not easy.

One way: for ( @array[ map{ $_*2 } 0 .. @array/2 ] ) { ... }

I'm not saying that this is better in every case, nor even that it is more intuative, but if the algorithm requires the manipulation of every other array element, this achieves that. There are other ways of doing this depending on the needs of the algorithm, see Re: Fold a list using map splices for another that is useful when you want to manipulate the array elements in pairs (or 3's or 4's etc), which is often the case when this type loop is used. Or Re: Starting foreach at an arbitrary position and Re: Non-destructive array processing for some interator solutions using closures.

The inability of perl iterators to iterate in steps is, I believe a known limitation that is overdue for being addressed and I also believe due to be addressed in Perl 6?

And of course, perl gives you the c-style for as well as the foreach for those occasions when it is required. The point I was trying to make is that for those occasions when the algorithm requires iteration over every element of an array, the foreach style loop removes many of the gotchas that pervade achieving the same purpose in C.

I never for a moment tried to imply that C cannot do anything that perl can do, only that perl hides many of the details that make it easy to make mistakes--starting the loop from 1 instead of 0; using <= instead of < in the terminating condition; etc.

What algorithm are we talking about? A algorithm might say "iterate over the elements of the set". A for or a foreach might be the implementation of that algorithm in a specific computer language, but that's not the algorithm itself.

I disagree. If the algorithm calls for "iterating over the elements of a set", that is exactly what the for/foreach loop does. I don't need to know how many or what type of elements are in the set, I just give the set to the for and it gives me them back one at a time until until I'm done. That's it. Said that way it may sound like a simple algorithm, but I would like to see a C implementation of an equivalent generic loop. Of course it can be done in C--that's how perl does it--but in order to acheive it in a way that the application programmer can use in his own algorithms, at the point of use, with the same level of simplicity you are basically going to have to re-write perl arrays. I have never seen a C-library that comes even close.

Oh, come on, get real. Just because something is written in C doesn't mean it's using stupid algorithms. One can use hashes in C as well (how do you think Perl hashes are implemented in perl?), and any programmer worth its salt will do so. It's an elementary programming technique. Noone in its right mind is going to employ the algorithm you are proposing.

Yes, of course you can implement hashes in C. I recently did exactly that, or at least converted a C implementation into Perl, if you recall. And yes, that would be a stupid implementation, but that was the point. The precursor to the example was

It's instructive to try and do this task in some language that doesn't have either hashes or dynamically sized & dynamically grown arrays.

I described that method to emphasis the benefits of having hashes available as a part of the language, but even implementing the hashes is not the complete solution, you still need the dynamic arrays.

And indeed, you don't know how many elements the hash is going to need (although there's a trivial upper bound!).

True. A trivial upper bound is available, but if the dataset is not memory bound, or too large to be momory bound, as might be the case with Tie::File for example, then the option of allocating an array of the size of that trivial upper bound may not be practical. Even when it is, it would be extremely costly in memory to allocated a 100_000 element integer array only to discover that there where only a handful of unique elements in the array. If the array is to big to fit in memory, then at least a second pass is required to discover how big to make the array of counts unless the elements are conveniently all of a fixed size that would allow a mathematical determination of the upper bound.

But then, you don't know that in Perl either, do you? And just as perl can grow a hash on demand, you can grow a hash in C as well. Amazing, isn't? (Oh, could it just be that perl is written in C?)

What's amazing is that you can read the same words I wrote and mis-interprete the intention of them so emphatically. The intent was to emphasis the benefits of having these features available in perl, of which the major ones are that you don't have to implement them yourself; and that you get a proven implementation out-of-the-box. Essentially the same benefits that come from any code re-use.

What high level data structures? Perl has arrays and hashes. That's it.

Perl's arrays are infinitely higher level than C's arrays. They are type polymorphic, dynamic and extensible. You better than most know that.

Yes, you can realloc and memcopy array space in C, but that doesn't even come close to acheiving the same result, as each element is still of a fixed size.

Yes, you can make the array, an array of pointers, and the pointers can point to any type of element, but you then have to manage not only the allocation, deallocation, growth and shrinkage of the array of pointers, but also the allocation/deallocation of the things pointed at, and you still haven't got there as you also need to then remember the type of data pointed at by each pointer. Even with this ability added, now try splicing a new set of elements into or out of that array of pointers and you have a whole new set of problems to address.

So now you implement the array of pointers as a single linked list of pointers to structures that contain type information and the data (or a pointer to it) and you have come close to re-implementing the Perl array.

Now add the structures and algorithms required to allow this data structure to be linked (tied) to a file of CVS or XML data, or a DB or....

Now try and write the generic loop to iterate over each element of that implementation with all the pointer chasing, dereferencing, type and bounds checking that involves.

Once you have done that, you will have what I think can rightly be categorised as a "high-level data structure", and what perl gives me out-of-the-box.

Nothing that would give you elements in some order. The only efficient search that you can do are exact matches using hashes. But no range queries, nearest neighbour, next, not even minimum.

Iterating an array could be said to be giving you the elements in 'some order' as could iterating the sorted keys of a hash, but I suspect that you are alluding to the absence from perl of a generic tree data-structure, and I concur that this provision would be a real benefit to perl. Whilst it is possible to implement trees of various kinds using perl, the overhead of switching in and out of the low-level and high-level representations incures a heavy penalty. It would be nice if the language could provide the programmer with high-level building blocks that would allow him to construct various kinds of trees whilst allowing the greater majority of the pointer chasing and node traversal to be performed by efficient (and tested, accurately coded) low-level code by perl itself. However, I (without having spent a lot of time thinking about it) cannot see how this might easily be done. Almost every action that can be performed upon a tree of any flavour requires an intimate knowledge of that flavour. It seems to me that to build a generic set of tree building blocks would require the programmer to supply a callback to Perl code that would be invoked for each node during every operation--insertions, deletions, traversals, rebalancing etc.--and these callbacks would severly restrain the performance and render the benefits of the provision minimal.

If you can see how this could be done, or how any other generic high-level datastructure that perl does not currently have could be added, I'd be interested to here about it.

Of course you can build all this stuff, but then, so you can with C.

The entire point of my original post was that Perls arrays and hashes in combination with its polymorphic scalars mean that many algorithms that benefit from encapsulating (much of) the algorithm in the data rather than in code, are simple to implement in perl. This is because it provides a simple set of (what I class as) high-level datastructures than can be used directly, or easily adapted to the implementation of many different algorithms.

Java or C++ provide extensive libraries of datatypes, but because these datatypes are extensions to the language rather than a part of it, it becomes necessary to also add code to the library to manipulate them rather than being able to manipulate them with a small number of generic built in functions. This means that they become very specific to the purpose for which they are designed which makes it difficult to use them in ways that the implementors didn't think to provide methods for. You can subclass them to add additional functionality but this often comes at the expense of adding complexity to the code that uses them and overhead that would be avoided if the functionality was added at the lower levels. It also makes it even harder to use them in a generic fashion or mix-n-match between them. As each datatype becomes more and more specialised, the problems of api growth, producing good reference documentation and programmer overload becomes more acute. I've already mentioned the myriad Collections classes in the C++ STL, my other favorite example are the Stream classes in Java, with UnbufferedStreams, BufferedInputStreams, BufferedOutputStreams, BufferReaders, BufferedWriters, DataInputStreams, DataOutputStreams, CharArrayReaders, CharArrayWriters, ByteArrayInputStreams, ByteArrayOutputStreams, ObjectInputStreams, ObjectOutputStreams, FileReaders, FileWriters and so on ad nauseam.

My experience of using these is that I start out reading and reading trying to decide which is the correct one for my purpose. I eventually settle upon one and start coding, I then hit an impasse where the class I have chosen doesn't allow me to do what I need, so I have to either switch all the code to a different class or coerse one type into another for a particular part of the algorithm. Then, as often as not, the requirements change or I find a limitation or omission in my (or someone elses) original specification, and find myself having to go back and do some or all of the code over to accomodate the change or belated discovery.

In my still less than one year of using perl, I have yet to encounter this scenario. I am constantly amazed by the flexibility of perls few but oh-do-useful datatypes and the way they lend themselves to use in so many algorithms. That's not to say I haven't recognised some limitations--tree structures as described above is one such--but I am still impressed as hell with what I can achieve with the basic types before I need to resort to constructing my own. I'm also impressed by the simplicity of mechanisms provided for extension when I do need to resort to that.

Like you, I wish that the OO mechanisms in perl where a little less ad-hoc although your own inside-out technique addresses many of those concerns. I also wish that the runtime costs of tieing, overloading and OO was less. I wish that the tieing mechanisms for aggragate data types allowed me to process subset parameters en-masse rather than individually as now; and I still baulk at splitting a scalar into an linked list of pointers to structures, each to hold a single 8-bit value, just to get at the individual chars. Using substr, /(.)/g or chop just obscures the algorithm with detail and unpacking is only a slight more efficient way of splitting to an array--I really miss the syntactic simplicity and efficiency of accessing the bytes or chars of a string directly using an array style syntax. I have also encountered situations where the overhead of these datastructures means that the flexibility provided by them is unneeded and is therefore pure overhead. This in turn can lead to the situation where the lack of control over the size and type of memory allocated to hold the data means that algorithms that would perform very well if the whole dataset could be fit in memory, becomes slow and inefficient because of the need to page the dataset to and from disc. Whether this paging is done through the OS, a perl mechinism like Tie::File or DBI or via a RDBMS. I had started devising my own modules to alleviate this, but discovered that tieing imposed it's own penalties. I also saw that Perl 6 is to address these issues with attibutes and so have ceased to further develop the ideas. I have modules for my own use for the interim.

All of that said, for the most part, I find that perl allows me to concentrate on the bigger picture rather than getting bogged down in the nitty-gritty of data storage and representation more of the time than any other language, and I attribute this to the power and flexibility of Perls few, but high-level data types. You are completely at liberty to call this "balony" if you wish, but cogent argument is more impressive than such dismissal.

Examine what is said, not who speaks.

1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.

Comment on Re: Re: Algorithms, Datastructures and syntax Select or Download Code


Perl-Sensitive Sunglasses
	PerlMonks