in reply to •Re: Finding and hightlight information
in thread Finding and hightlight information

I have been thinking about this idea, and I can make it work by doing something like this:
use strict; open INFILE,"infile.txt"; open OFILE,">outfile.txt"; my $total_os=0; while(<INFILE>){ my $tmp=$_; $str=~s/(\s+)/osmarker(pos($str),$1)/xeg; # a bunch of regular expressions $total_os+=length($_); print OFILE $str; } sub osmarker{ my $os=shift; my $spaces=shift; $os+=length($spaces)+total_os; return $spaces."<OS=$os>"; }
The problem that inserting this markers has is not on the data mining tool, but in the regular expressions that munge in the text. There are some that look for "WORD\s+WORD" that would be screwed up by this marker. I could fix this by defining some variable like this:
my $space=qr/(?:<OS=\d+>|\s)/;
and replacing all instances of "\s" with "$space". Is there an easier way of doing this? Is there a way to overload "\s"?