comment on

Interesting problem. I looked at the source for the parser, and have a couple of (completely untested) suggestions:
1) Pass the whole citation block as a single line to the parser. Maybe in most instances it either ignores stuff after the end of the best match, or it lumps the last part of the string into some tag. In the former case, you could match the various returned tags against the current line, to find where the parser stopped, and start again from there. In the latter case, you could find the tag with the lumped data and work with that.
2) Create your own small parser to find citation ends. Don't almost all of them either end in a page reference or a publication reference, and also ultimately with a period?
In the event that (1) isn't feasible, and performance is not an issue (probably not, given the way the parser works), Your (2) could probably be written to catch 80+% of the situations. When it punts, go past where it punted a few tokens, then iterate parse calls subtracting one token at a time and keep track of reliability. It will most likely max out at the correct place. You can then continue again from that point in the string.
Hope you hadn't already thought of all this, and that it was helpful.

In reply to Re: Extracting Bibliography Citations by Illuminatus
in thread Extracting Bibliography Citations by Limbic~Region

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.