in reply to Re^2: Extracting Bibliography Citations
in thread Extracting Bibliography Citations

This isn't far off from my approach.

Oh, no. That isn't fair. Saying "mine is better, but I won't show you" just isn't fair :-(

And keep in mind that only you have the full variety of input at hand, so this is my last suggestion:

{ local $/; $_= <DATA> } print "<<$1>>\n" while /(.*?(?:[\dZ]\)\.|pp\..*?))\r?\n?(?=[A-z]{2})/g +s; __DATA__ ...(data as per OP)... __END__ <<Biggs, S. F. & Mock, T. J., An Investigation of Auditor Decision Pro +cesses in the Evaluation of Internal Controls and Audit Scope Decisions, Journal ofAccounting Research (Spr +ing 1983) pp. 234255.>> <<Ericsson, K_A. & Simon, H. A., Verbal Reports as Data, Psychological + Revieu' (May 1980), pp. 2 15-25 1.>> <<Feinstcin, A. R., An Analysis of Diagnostic Reasoning: the Construct +ion of Clinical Algorithms, YuleJournal of Biology andMedicine ( 1974), pp. 5-32.>> <<Kennedy, R. E. & Wilson, M. H.. The Corporate Information Investors +Really Want, Business (January- February 1980) pp. 42-46.>> <<Larcker. D. F. & Lcssig, V. P.. An Examination of the Linear and Ret +rospective Process TracingApproaches to Judgment Modeling, TheAccounting Review (January 1983) pp. 58-77.>> <<Lcr. B., Financial StutementAnaIysis: a NewApproach (Inglewood Cliff +s, NJ: Prentice-Hall, 1978).>> <<Lindsay, R K, Buchanan, B. G., Feigenbaum, E. A. & Lederberg, J.AppI +ications ofArtificialIntelIigencefor Organic Chemis~q~: the DENDRAL Project (New York: McGraw-Hill. 1980).> +> <<Miller, R. A., Pople, H. E.. and Meyers, J. I).. INTERNIST-I, an Exp +erimental Computer-Based Diagnostic Consultant for General Internal Medicine, :Veul England Journal of Med +icine (August 1982), pp. 46% 4'6>> <<Newell. A. 8r Simon, 11.A, Human Problem Solzv'ng (Englcwood Cliffs, + NJ, Prentice-Hall. 1972).>> <<Payne, J. W., BKUUBtein, hl. 1.. bi Carroll, J. S., Exploring Predec +isional Behavior: an Alternative Approach to Decision Kc-search, Orgunizutional Behuzjior and Human Performance (Fe +bruary 1978) pp. 17-44.>> <<Porcano. `I`.M., A Comparison of Information Needs and Sources of th +e Investment Community, Akron Business andEconomic Kezjieuv(Fall 1981) pp. 43-52.>> <<Reilly. F K, Im%Thnents (New York: Dryden Press, 19&Z).>> <<Ricchiutc, D. N.. An Empirical Assessment of the Impact of AlterIXIt +ive Task Presentation Modes on Decision-Making Research in Auditing, Journal of Accounting Reseurcb ( +Spring 19&i), pp. 34 I-350.>> <<Rich, E., Art~ficiul Intelligence (New York: McGraw-llill, 1983).>> <<Shields. X1.D Some Effects of Information Load on Search Patterns Us +ed to Analyze Performance Reports, Accounling, Organizahons and Socie[y ( 1980) pp. 429-442.>> <<HOW DO FINANCIAL ANALYSTS MAKE DECISIONS? 29 Shields, M. D., Effects of Information Supply and Demand on Judgment A +ccuracy: Evidence from Corporate Managers, TheAccountingReview(April 1983) pp. 284-303.>>

Replies are listed 'Best First'.
Re^4: Extracting Bibliography Citations
by Limbic~Region (Chancellor) on Sep 02, 2008 at 16:40 UTC
    shmem,
    Oh, no. That isn't fair. Saying "mine is better, but I won't show you" just isn't fair :-(

    Hrmm. I certainly wasn't complaining nor was I trying to tell you it wasn't good enough. I purposely avoided sharing what I had come up with as to not influence solutions. The intent of my comment was more along the lines of "The basic strategy is sound with minor tweaking". When I said "but I am looking to improve if possible", I apologize if that came across as "try harder shmem but do so blind". I wanted to indicate that a different strategy all together might be better since minor refinements on the existing one are going to lead to diminishing returns.

    Thank you for your contributions. They are valued and appreciated.

    Cheers - L~R

      Thank you for your contributions. They are valued and appreciated.

      Fair enough, thank you. - Still I'm blind as to whether a single global match is enough for all cases, since I have not seen other sample data. It certainly isn't more than saying 'citations end with \dZ)|pp.<something> and are followed by two letters', but as long as I see no contradictory data it is good enough... can't code for cases I haven't seen. So, saying

      Since it doesn't have to be perfect this is fine but I am looking to improve if possible.

      without giving more clues as to what needs improvement is, well... but of course, my regexp breaks on a citation beginning with e.g "J.Morgan"...