in reply to Matching indented paragraph numbering with regexps

Given the lack of restrains in the text to parse, this might be impossible to get right all the time. You already pointed out the difficulty (or rather, impossibleness) of determining whether i is a Roman numeral, or a Latin letter. To complicate things further, both i and a are ordinary English words. vi is the name of an editor, and xi is an uncommon, but not impossible, English word. And so is li.

Abigail

  • Comment on Re: Matching indented paragraph numbering with regexps

Replies are listed 'Best First'.
Re: Re: Matching indented paragraph numbering with regexps
by Crian (Curate) on Apr 01, 2004 at 13:43 UTC
    You could restrict the numerals to be the first not-whitespace on a line and have to end with a point.

    But nevertheless you could do a failure, perhaps if the fifth sub point ends with the word "vi." in a single line:

    iv. ...emacs ....
    v. ......... ends width the word
    vi.
    vi. ....
    vii ....

    But with this restrictions it's even harder to fail. Additional you could try to stepp back, if you find double numerals. But even that does not help if the point v. in the example above was the last one.

    I think the "right" solution depends much on your data and the demands at (to?) your program.