Update: just found out from perlfaq5 you can open a filehandle on a string ref...sheepish grin...so mebbe this is all mute.

I've wondered this idly before: is there a way to use string pointers in perl?

I've been working on a scraper that inherits from HTML::Parser, which is essentially an event driven state machine. Initially, I wanted this to work on a string, so the class could be used with a script that works stdin->stdout. But it is much easier to do with a file, because I can use the filehandle as a pointer (via tell and seek).

Reason being: I want the input processed in stages. So the class is instantiated with a filename, and a handle opened to it. These are html files which contain a repeated segment -- they are literally arranged in a list, so let's call one a "listing". There is a function, "nextListing", which pulls these out one at at time.

With a filehandle, this is easy -- I feed the parser one line at a time, keeping our place at the beginning of each line with tell(); this way, when the listing is done, we can rewind slightly to catch the beginning of the next one. This works fine due to the nature of the material.

I did do a version which will take a string, by turning it into an array and shifting the lines off one at a time, then unshifting one back on once the listing ends. This works too, but it still seems very odd to me that there is no more simple way to keep place in a string for a process like this.

I could do something with substr(), but that would be very inefficient since it means copying all the data repeatedly. So in general: does anyone know of a method here? Have I missed something obvious all this time?

Related question: I have not done any real XS programming, yet, but I am aware of it and fluent in C. This is a dead simple task with pointers, so it probably would be a dead simple module to write. And in fact, I notice there is a module Pointer on CPAN which seems like it might do this but the author says:

Pointers are tricky beasts, and there are myriad platform issues. At this point, Pointer.pm is but a naive attempt at a novel idea. Hopefully it can be fleshed out into a robust and serious module.

Has anyone used this? Any caveats? One thing I could not find is a module which deals specifically with strings and avoids the actual pointer nomenclature (eg, so exports functions to read a string, return the current position, seek a different one, etc). The CPAN String module does not seem to go that far. Who likes this idea?


In reply to string pointers in perl? by halfcountplus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.