One thing I did notice in your code (and after runnning) was that the word count seems to be just 1. I changed it back to what I originally had so that it properly refers to ~78,000.
my @word_count = (); @word_count = split(/\s/, $file); # Find out how many words are i +n abstracts. my $word_number = scalar(@word_count);
I now have another problem in that some terms are still not picked up. I think this is because they contain special characters and a combination of upper and lowercase letters. I may be wrong.
These terms include:
adenosine-5'-triphosphate levels 0 h(2)O(2) 0 MPP+ 0 -dichlorophenyl)-1,1-dimethylurea 0 adenosine-5'-triphosphate synthesis 0 photosynthesis, the antioxidant enzyme activities of superoxide dismut +ase (superoxide dismuase) (EC 0 bcl-X(L) 0 ca2+ 0 adenosine-5'-triphosphate production 0 ca(2+) 0 mitochondrial phospholipid hydroperoxide glutathione photosynthesis, t +he antioxidant enzyme activities of SOD (superoxide dismuase) (EC +0 bcl-x(L) 0 deltapsi(m) 0 pirin(Sm) 0 rho(0) 0
Any ideas as to how to resove this. I thought maybe using some escape character, but, have no idea how to integrate that into my original regex.
MonkPaul
In reply to Re^2: Regex word boundries
by MonkPaul
in thread Regex word boundries
by MonkPaul
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |