The problem that inserting this markers has is not on the data mining tool, but in the regular expressions that munge in the text. There are some that look for "WORD\s+WORD" that would be screwed up by this marker. I could fix this by defining some variable like this:use strict; open INFILE,"infile.txt"; open OFILE,">outfile.txt"; my $total_os=0; while(<INFILE>){ my $tmp=$_; $str=~s/(\s+)/osmarker(pos($str),$1)/xeg; # a bunch of regular expressions $total_os+=length($_); print OFILE $str; } sub osmarker{ my $os=shift; my $spaces=shift; $os+=length($spaces)+total_os; return $spaces."<OS=$os>"; }
and replacing all instances of "\s" with "$space". Is there an easier way of doing this? Is there a way to overload "\s"?my $space=qr/(?:<OS=\d+>|\s)/;
In reply to Re: •Re: Finding and hightlight information
by fletcher_the_dog
in thread Finding and hightlight information
by fletcher_the_dog
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |