in reply to Re: Regex word boundries
in thread Regex word boundries
One thing I did notice in your code (and after runnning) was that the word count seems to be just 1. I changed it back to what I originally had so that it properly refers to ~78,000.
my @word_count = (); @word_count = split(/\s/, $file); # Find out how many words are i +n abstracts. my $word_number = scalar(@word_count);
I now have another problem in that some terms are still not picked up. I think this is because they contain special characters and a combination of upper and lowercase letters. I may be wrong.
These terms include:
adenosine-5'-triphosphate levels 0 h(2)O(2) 0 MPP+ 0 -dichlorophenyl)-1,1-dimethylurea 0 adenosine-5'-triphosphate synthesis 0 photosynthesis, the antioxidant enzyme activities of superoxide dismut +ase (superoxide dismuase) (EC 0 bcl-X(L) 0 ca2+ 0 adenosine-5'-triphosphate production 0 ca(2+) 0 mitochondrial phospholipid hydroperoxide glutathione photosynthesis, t +he antioxidant enzyme activities of SOD (superoxide dismuase) (EC +0 bcl-x(L) 0 deltapsi(m) 0 pirin(Sm) 0 rho(0) 0
Any ideas as to how to resove this. I thought maybe using some escape character, but, have no idea how to integrate that into my original regex.
MonkPaul
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Regex word boundries
by ikegami (Patriarch) on Oct 19, 2007 at 13:25 UTC | |
by MonkPaul (Friar) on Oct 29, 2007 at 15:24 UTC | |
by ikegami (Patriarch) on Oct 29, 2007 at 15:55 UTC |