Mjpaddy has asked for the wisdom of the Perl Monks concerning the following question:

HI Monks

im new to perl and trying to write regex for following

Table of content

1)Introduction

a)perl?

I)It's Free

II)What Is Perl Used For?

b)Windows, UNIX, and Other Oper (3) ating Systems

I)The Prompt

c)What Do I Need To Use This (X) Book?

I)How Do I Get Perl?

II)How To Get Help

III)Perl Resources (a) ksdjfg

IV)hdjsa

X)jdfse

2)Chapter 1: First Steps In Perl

a)Programming Languages

I)Interpreted vs. Compiled Source Code

II)Libraries, Modules and Packages

b)Why is Perl Such A Great Language?

I)It's Really Easy

II)Flexibility Is Our Watchword

III)Perl on the Web

i am doing to extract the each similar index like all 1),2) and a),b),c) and I),II),III)

can anyone will help me how to write regex for this..

Replies are listed 'Best First'.
Re: regular expression
by Ratazong (Monsignor) on Jun 02, 2014 at 07:33 UTC

    Hi

    •  /^\d+\)/ extracts a number followed by a closing bracket
    •  /^\w\)/ extracts a letter followed by a closing bracket
    •  /^I+\)/ extracts a number of Is followed by a closing bracket (note: this will not cover roman numbers starting with IV)
    See this great collection for more infos on how to create regexes.

    HTH, Rata

    Update: better use /^[a-zA-Z]\)/ instead of  /^\w\)/ - thanks JohnGG

      /^\w+\)/ extracts a letter followed by a closing bracket

      \w also matches digits.

      $ perl -E 'say $1 if q{ab1d2f} =~ m{^(\w+)$};' ab1d2f $

      Cheers,

      JohnGG

      Possibly even better than [a-zA-Z], which makes a limiting assumption about the alphabet being used, one could use \p{Alpha}, which accepts 102159 different code points, all of which may be considered part of the alphabet of some language somewhere at some time, and none of which include numeric digits.

      Instead of I+, how about using Regexp::Common's $RE{num}{roman} pattern, which will correctly match roman numerals, case insensitively. Here's the pattern it uses to do so:

      (?xi)(?=[MDCLXVI]) (?:M{0,3} (D?C{0,3}|CD|CM)? (L?X{0,3}|XL|XC)? (V?I{0,3}|IV|IX)?)

      Dave

Re: regular expression
by Anonymous Monk on Jun 02, 2014 at 07:29 UTC
Re: regular expression
by locked_user sundialsvc4 (Abbot) on Jun 02, 2014 at 18:55 UTC

    This regex appears to mostly consist-of:   /^([A-Za-z0-9]+)\s*\)\s*(.*)$/

    ... because this appears to be what characterizes “source-lines of interest.”   viz:

    • Beginning at start-of-line, one-or-more alphanumeric characters, which is “group #1.”
    • A right-parenthesis character, optionally surrounded by whitespace.
    • Everything else to the end of such a line, which is “group #2.”