halfcountplus has asked for the wisdom of the Perl Monks concerning the following question:

Update: just found out from perlfaq5 you can open a filehandle on a string ref...sheepish grin...so mebbe this is all mute.

I've wondered this idly before: is there a way to use string pointers in perl?

I've been working on a scraper that inherits from HTML::Parser, which is essentially an event driven state machine. Initially, I wanted this to work on a string, so the class could be used with a script that works stdin->stdout. But it is much easier to do with a file, because I can use the filehandle as a pointer (via tell and seek).

Reason being: I want the input processed in stages. So the class is instantiated with a filename, and a handle opened to it. These are html files which contain a repeated segment -- they are literally arranged in a list, so let's call one a "listing". There is a function, "nextListing", which pulls these out one at at time.

With a filehandle, this is easy -- I feed the parser one line at a time, keeping our place at the beginning of each line with tell(); this way, when the listing is done, we can rewind slightly to catch the beginning of the next one. This works fine due to the nature of the material.

I did do a version which will take a string, by turning it into an array and shifting the lines off one at a time, then unshifting one back on once the listing ends. This works too, but it still seems very odd to me that there is no more simple way to keep place in a string for a process like this.

I could do something with substr(), but that would be very inefficient since it means copying all the data repeatedly. So in general: does anyone know of a method here? Have I missed something obvious all this time?

Related question: I have not done any real XS programming, yet, but I am aware of it and fluent in C. This is a dead simple task with pointers, so it probably would be a dead simple module to write. And in fact, I notice there is a module Pointer on CPAN which seems like it might do this but the author says:

Pointers are tricky beasts, and there are myriad platform issues. At this point, Pointer.pm is but a naive attempt at a novel idea. Hopefully it can be fleshed out into a robust and serious module.

Has anyone used this? Any caveats? One thing I could not find is a module which deals specifically with strings and avoids the actual pointer nomenclature (eg, so exports functions to read a string, return the current position, seek a different one, etc). The CPAN String module does not seem to go that far. Who likes this idea?

Replies are listed 'Best First'.
Re: string pointers in perl?
by jethro (Monsignor) on Nov 24, 2010 at 16:22 UTC

    I don't see why unshifting a string onto an array is less simple or more odd than fiddling around with fseek and tell.

    If you just want to avoid unshift, you could leave your array intact instead of shifting off it and just remember the line you are (i.e. the array index). Then going back to the previous line is done with $pointer--

    If it is about seeking into the middle of a line then you could slurp the whole file into one single string and use substr or a regex (which is still a lot faster than any file access on files of normal length). Or split lines at "interesting" points (use splice to add the additional line into the array) to make sure that the point you want to go to is always at the beginning of a line.

Re: string pointers in perl?
by Anonymous Monk on Nov 24, 2010 at 15:45 UTC
Re: string pointers in perl? (pos)
by LanX (Saint) on Nov 24, 2010 at 23:55 UTC
Re: string pointers in perl?
by locked_user sundialsvc4 (Abbot) on Nov 24, 2010 at 19:31 UTC

    imho($it->stinks(true));

    Yeah, there is a perldoc perlguts, but I ain’t gonna write code that way.   Seriously, if the code isn’t running fast enough for ya, either find a better algorithm or buy a faster box.   Pointers Are Evil.   (At least in this context.)   “The land of diminishing returns” under the best of conditions, and “mysterious error-messages leading to pagers going off at two-thirty” more typically.

    Don’t “diddle” code to make it faster:   find a better algorithm.
    – Kernighan & Plauger; The Elements of Programming Style.

Re: string pointers in perl?
by JavaFan (Canon) on Nov 24, 2010 at 17:34 UTC
    I could do something with substr(), but that would be very inefficient since it means copying all the data repeatedly
    Really? You may be surprised. Sure, if you remove random data from the string, there will be copying happening, but then you're doing something you won't be able to do with a string pointer. But a Perl string is internally just a C string. With some additional data. Removing things from the front of a string means just a pointer is moved. No copying of data.

    I bet you didn't benchmark your bold claim, did you?