I am trying to write a regexp match which will pick up a string of 6 to 9 digits in the middle of a longer string.

here is a typical example. It is a line from an online dog show catalog.The top line is the string the two comments are there to provide and explanation

A ||118|AVIANN GILDED WILD HONEY. HM 75081701. 02-04-97 #|--1---|---2--------------------| |---3-----| |--4----| #explanation of fields above

the areas of interest within the line are

(1) stuff to be discarded

(2)the dog's name

(3) the dog's registration number - usually it is 2 alpha characters a space and 8 digits but it could be two alpha characters and 6 digits

(4) the date of birth - 2 digit year

The original catalog entry has some un-needed information associated with 3 sets of tabs at the beginning but I go through the line with a substitution and change the tabs to pipes (|) because it is easier to get rid of them in a regexp (since I can see them).

the dog's name may contain non-alpha characters such as hyphens, single quotes, ampersands and periods so just looking for \w does not work

the registration number is typically of the 2 letter, space, 8 digit formula but sometimes there are typos and there are more or less than 8 digits or it is a foreign registration number. Another typo is having to hyphens (--) between the alphabetic and the numeric part(I am not as worried about the last case).

The date of birth is pretty constant in form.

here is a regexp that works fine if there if the only error is having 7 digits rather than 8 digits in the registration number. It also accepts some of the non-standard foreign registration numbers and the double -- hyphen typoed reg number that looks like this.

HP--09090901.

if( $line=~m/(\|.*)(\|)(.*)(\w{2}.*\d{7,8}\w*).(\s\d\d[\W|-]{1}\d\d[\W|-]{ +1}\d\d)/ ){

I would like it to accept a range of 6 to 9 digits but when I try to substitute {6,9} it does not recognize the input line.

This web page:
http://www.grymoire.com/Unix/Regular.html
suggested:
There is a special pattern you can use to specify the minimum and maximum number of repeats. This is done by putting those two numbers between "\{" and "\}". The backslashes deserve a special discussion. Normally a backslash turns off the special meaning for a character. A period is matched by a "\." and an asterisk is matched by a "\*".

but \d\{6,9\} does not work for me. I am suspecting that maybe it is not implemented in perl.

I am running perl v5.10.0 built for i486-linux-gnu-thread-multi under Ubuntu Karmic. This is the perl that is a standard installation on Ubuntu Karmic.


In reply to regexp identify variable number of digits within a sentence by bdalzell

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.