Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I need a code that given a byte offset in a multiline string, will return a line number and offset inside that string (e.g. imagine that a string contains entire content of some text file). The byte offset will be advancing on each call, so it may perform some smart optimizations.. Size of input string won't be greater than 200kb.

I can code such a beast in half of an hour, but may be some modules exist on CPAN for this?

Thank you for your answer in advance!

  • Comment on Find line number and offset inside line given a byte offset in string

Replies are listed 'Best First'.
Re: Find line number and offset inside line given a byte offset in string
by Limbic~Region (Chancellor) on Nov 28, 2005 at 14:29 UTC
    Anonymous Monk,
    I don't know of a module on CPAN that does exactly what you want. Here is how I would do it:
    • 1. Index the offset of newlines in the string much like Tie::File does with files
    • 2. Perform binary search to find the line that the offset belongs on
    • 3. Subtract the newline offset of that line from the given offset to find the offset in that line
    • 4. Reset your start newline to the last visited making subsequent binary searches faster

    Cheers - L~R

Re: Find line number and offset inside line given a byte offset in string
by Zaxo (Archbishop) on Nov 28, 2005 at 15:54 UTC

    You can get the byte offset past each newline into an array like this:

    our @ofs = (0); push @ofs, pos while $string =~ /(\n)/g;
    Then you can find the line number and line offset of a given byte offset by calling:
    sub ofs2line { my $offset = shift; for (0 .. $#ofs) { return ($_, $offset-$ofs[$_-1]) if $offset < $ofs[$_]; } return ();
    That takes offsets to start with zero. Untested, beware fencepost errors.

    After Compline,
    Zaxo

Re: Find line number and offset inside line given a byte offset in string
by QM (Parson) on Nov 28, 2005 at 16:34 UTC
    For the sake of diversity, how about splitting and creating an array of offsets? Something like this:
    my @strings = split "\n", $string; my @lookup; $lookup[0] = 0; foreach my $i ( 1..$#strings ) { # byte offset of 1st char on this line $lookup[$i] = length($strings[$i-1]) + $lookup[$i-1]; }
    Here's a complete program, including the binary search routine. (Not completely tested, but seems to work, and gives the general idea.)

    I'm sure there are numerous improvements that can be made (including error checking), so hack away!

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of