I have put together some code to parse a colon delimited data file. The regular expression I have built traps some of the colon delimited values. In any case, I would like some pointers with the regexp pattern. An all purpose matching of anything in-between the colon delimited string would be ideal in addition to matching strings that contain fewer data values as can be seen below:
foreach (<DATA>) { if($_ =~ m/(\d{1,2})?\:?(\w\d)?\:?(\b\w..*\b)?\:?(.*|N\/A)?\:?(\d{1,2} +.+)?\:?(\d{2}\s?GREEN|RED|XX)?\:?(.*)?\:?(.*)?\:?(\bsquare\b)?/) { #if($_ =~ $_ =~ m/(\d{1,2})\:?(\w\d)?\:?(\b\w..*\b)?\:?(.*|N\/A)?\|\|? +(\d{1,2}.+)?\:?(\d{2}\s?GREEN|RED|XX)?\:?(.*)?\:?(.*)?\:?(\bYELLOW\b) +?/) { if (defined $1) { $count=$1; } else { $count="nothing"; } if (defined $2) { #code $grade=$2; } else { $grade="nothing"; } if (defined $3) { #code $pos=$3; } else { $pos="nothing"; } if (defined $4) { #code $name=$4; } else { $name="nothing"; } if (defined $5) { #code $country=$5; } else { $country="nothing"; } if (defined $6) { #code $date=$6; } else { $date="nothing"; } if (defined $7) { #code $age=$7; } else { $age="nothing"; } if (defined $8) { #code $vacant=$8; } else { $vacant="nothing"; } if (defined $9) { #code $square=$9; } else { $count="nothing"; } #print "We have a match!\n"; print join " ",$count,$grade,$pos,$name,$date,$country,$age,$vacant,"\ +n"; } } __DATA__ 1:D2:DIRECTOR:D. Green:4/15/1953:61 XX:UNITED KINGDOM OF GREAT BRITAIN + AND NORTHERN IRELAND:::: 1:D1:DEPUTY DIRECTOR:D. Green::6/20/1964:50:TUNISIA REPUBLIC OF:::: 1:P5:SENIOR POLICY OFFICER:D. Green::7/7/1954:60 GREEN:UNITED KINGDOM +OF GREAT BRITAIN AND NORTHERN IRELAND:::: 9:P5:SENIOR ECONOMIST:D. Green::7/23/1958:56:UNITED KINGDOM OF GREAT B +RITAIN AND NORTHERN IRELAND:::: D. Green::10/29/1953:60 GREEN:PERU REPUBLIC OF:*::: D. Green::10/26/1955:58:SPAIN KINGDOM OF:*::: D. Green::5/15/1967:47:FRENCH REPUBLIC:::: D. Green:g:12/6/1954:59:FIJI REPUBLIC OF:::: D. Green::6/8/1967:47:UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRE +LAND:::: D. Green::9/16/1960:54:UNITED STATES OF AMERICA:::: N/A::Vacant:UNASSIGNED::YELLOW::
Output from above:
nothing D2 DIRECTOR:D. Green:4/15/1953:61 XX:UNITED KINGDOM OF GREAT B +RITAIN AND NORTHERN IRELAND ::: nothing nothing nothing D1 DEPUTY DIRECTOR:D. Green::6/20/1964:50:TUNISIA REPUBLIC OF +::: nothing nothing nothing P5 SENIOR POLICY OFFICER:D. Green::7/7/1954:60 GREEN:UNITED KI +NGDOM OF GREAT BRITAIN AND NORTHERN IRELAND ::: nothing nothing nothing P5 SENIOR ECONOMIST:D. Green::7/23/1958:56:UNITED KINGDOM OF G +REAT BRITAIN AND NORTHERN IRELAND ::: nothing nothing nothing nothing D. Green::10/29/1953:60 GREEN:PERU REPUBLIC OF *::: no +thing nothing nothing nothing D. Green::10/26/1955:58:SPAIN KINGDOM OF *::: nothing +nothing nothing nothing D. Green::5/15/1967:47:FRENCH REPUBLIC ::: nothing not +hing nothing nothing D. Green:g:12/6/1954:59:FIJI REPUBLIC OF ::: nothing n +othing nothing nothing D. Green::6/8/1967:47:UNITED KINGDOM OF GREAT BRITAIN +AND NORTHERN IRELAND ::: nothing nothing nothing nothing D. Green::9/16/1960:54:UNITED STATES OF AMERICA ::: no +thing nothing nothing nothing N/A::Vacant:UNASSIGNED::YELLOW : nothing nothing nothing nothing nothing nothing nothing
Many thanks

In reply to Regular Expression to Extract Anything from Colon Delimited String by GuiPerl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.