In order to grab things matching both of these formats (and to catch any future style variations), you can sort of just ignore the style information. So, you know that you definitely want to match <td\sclass="coauthor"as well as ><a\shref="([^"]+)">([^>]+)<\/a>, but you almost don't care about what is in between, right?
The reason I say "almost don't care" is because you want to match everything EXCEPT a closing '>' to make sure your regex doesn't match too much. [^>]* matches 0 or more characters that are not the character '>'.
Comment on Re^5: fix the problem of the web crawler
I solved it, thousand thanks to you. now it works as before. but there is another problem, if you can suggest me what I can do. the authors with names of more than 3 parts are not crawled, and authors containing ( ' , .Jr, II, III). example this authors:
Norie De La Cruz
Norm O'Neill
Norman L. Guinasso Jr.
Norris Milton II
Northrup Fowler III
Noor Asna Fazli Abdul Samad
N. S. S. S. N. Usha Devi
Niels H. M. Aan de Brugh