LexPl has asked for the wisdom of the Perl Monks concerning the following question:

I have got lists with two types of list markers, namely numeric (1., 2., 3., etc.) and alphabetic "a)", "b)", "c)", etc. or "aa)", "bb)", "cc)", etc. I want to check whether each item of a list is numbered subsequently according to its list marker type.

This means in numeric lists a subsequent item should be numbered as n+1 plus a dot, where n is the number in the current item's term and n+1 would be the number in its parent::item/following-sibling::*[1]/self::item/term.

In alphabetic lists the parent::item/following-sibling::*[1]/self::item/term should contain the next letter(s) in alphabetical order from the current item/term plus a closing parenthesis. So for example after "b)" would follow "c)" and after "cc)" would follow "dd)".

I do not want to change the numbering, but just check and document the errors found and ideally their locations.

The input format would look like this:

<list>
  <item>
    <term>1.</term>
    <para>jhjhjh hjkjkjkj</para>
  </item>
  <item>
    <term>3.</term>
    <para>jhjhjh hjkjkjkj</para>
  </item>
  <item>
    <term>4.</term>
    <para>jhjhjh hjkjkjkj</para>
  </item>
</list>

Or:

<list>
  <item>
    <term>a)</term>
    <para>jhjhjh hjkjkjkj</para>
  </item>
  <item>
    <term>b)</term>
    <para>jhjhjh hjkjkjkj</para>
  </item>
  <item>
   <term>d)</term>
   <para>jhjhjh hjkjkjkj</para>
  </item>
</list>

Or:

<list>
  <item>
    <term>aa)</term>
    <para>jhjhjh hjkjkjkj</para>
  </item>
  <item>
    <term>cc)</term>
    <para>jhjhjh hjkjkjkj</para>
  </item>
  <item>
    <term>dd)</term>
    <para>jhjhjh hjkjkjkj</para>
  </item>
</list>

These input samples deliberately contain errors that should be detected and located.

Please note that in the samples I had to use <para> instead of <p>, but in the Perl script, it should be <p>-elements.

Replies are listed 'Best First'.
Re: Check numbering of list items
by Corion (Patriarch) on Dec 09, 2024 at 13:02 UTC

    So, what code have you written and where does it fail?

      As a newbie, I'm afraid I don't yet know how to tackle this issue.

        So, how would you tackle this issue without a computer?

        Write that down, and then translate that into a computer program.

        In the end, it will always come out to checking whether a list of terms is sequential.

        You could split your approach in two parts:

        1. Extract a list of terms from a document
        2. Check whether a list of terms is sequential

        ... and implement each of these, and in the end put them together.

Re: Check numbering of list items
by tybalt89 (Monsignor) on Dec 09, 2024 at 16:35 UTC

    This may have some idea where to start, or then again it might not :)

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11163097 use warnings; use List::Util qw( reduce ); local $/ = ''; # read paragraphs while( <DATA> ) { my @terms = /<term>(.*?)</g; print "Input: @terms\n"; my @fails; reduce { if( $a =~ /^\d/ ) { $a + 1 == $b or push @fails, "fail at $a -> $b\n"; } else { my $left = $a =~ tr/a-z/b-za/r =~ s/^a/aa/r; $left eq $b or push @fails, "fail at $a -> $b\n"; } $b } @terms; print @fails; print "\n"; } __DATA__ <list> <item> <term>1.</term> <para>jhjhjh hjkjkjkj</para> </item> <item> <term>3.</term> <para>jhjhjh hjkjkjkj</para> </item> <item> <term>4.</term> <para>jhjhjh hjkjkjkj</para> </item> </list> <list> <item> <term>a)</term> <para>jhjhjh hjkjkjkj</para> </item> <item> <term>b)</term> <para>jhjhjh hjkjkjkj</para> </item> <item> <term>d)</term> <para>jhjhjh hjkjkjkj</para> </item> </list> <list> <item> <term>aa)</term> <para>jhjhjh hjkjkjkj</para> </item> <item> <term>cc)</term> <para>jhjhjh hjkjkjkj</para> </item> <item> <term>dd)</term> <para>jhjhjh hjkjkjkj</para> </item> </list> <list> <item> <term>xx)</term> <para>jhjhjh hjkjkjkj</para> </item> <item> <term>yy)</term> <para>jhjhjh hjkjkjkj</para> </item> <item> <term>zz)</term> <para>jhjhjh hjkjkjkj</para> </item> <item> <term>aaa)</term> <para>jhjhjh hjkjkjkj</para> </item> <item> <term>bbb)</term> <para>jhjhjh hjkjkjkj</para> </item> <item> <term>yyy)</term> <para>jhjhjh hjkjkjkj</para> </item> <item> <term>zzz)</term> <para>jhjhjh hjkjkjkj</para> </item> <item> <term>aaa)</term> <para>jhjhjh hjkjkjkj</para> </item> </list>

    Outputs:

    Input: 1. 3. 4. fail at 1. -> 3. Input: a) b) d) fail at b) -> d) Input: aa) cc) dd) fail at aa) -> cc) Input: xx) yy) zz) aaa) bbb) yyy) zzz) aaa) fail at bbb) -> yyy) fail at zzz) -> aaa)