I have been thinking about this idea, and I can make it work by doing something like this:
use strict; open INFILE,"infile.txt"; open OFILE,">outfile.txt"; my $total_os=0; while(<INFILE>){ my $tmp=$_; $str=~s/(\s+)/osmarker(pos($str),$1)/xeg; # a bunch of regular expressions $total_os+=length($_); print OFILE $str; } sub osmarker{ my $os=shift; my $spaces=shift; $os+=length($spaces)+total_os; return $spaces."<OS=$os>"; }
The problem that inserting this markers has is not on the data mining tool, but in the regular expressions that munge in the text. There are some that look for "WORD\s+WORD" that would be screwed up by this marker. I could fix this by defining some variable like this:
my $space=qr/(?:<OS=\d+>|\s)/;
and replacing all instances of "\s" with "$space". Is there an easier way of doing this? Is there a way to overload "\s"?

In reply to Re: &bull;Re: Finding and hightlight information by fletcher_the_dog
in thread Finding and hightlight information by fletcher_the_dog

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.