Dear Monks

In my attempts in trying to improve my understanding of Regex, I have the following data from a HTML dump;
1 adriaanf Europe Local _Default Different Owner For Target Machin +e HA050069 OYWVM1237 LN-CS/06 Technology XP DESKTOP Dell OptiPlex G +X270 (0151) Pentium 4 (1 x 2793) 1023 38146 11147 N/A N/A OYWVM1237 Technology LN-OY/02 VIRTUAL (OYWVH161) VMWare VMWare For D +esktop Not Defined (1 x 3065) 767 20473 8209 23/11/2004 11:10:02 N/A + N/A N/A N/A 2 adriaanf Europe Local _Default Different Owner For Target Machin +e HA050069 OYWVM1262 LN-CS/06 Technology XP DESKTOP Dell OptiPlex G +X270 (0151) Pentium 4 (1 x 2793) 1023 38146 11147 N/A N/A OYWVM1262 Technology LN-OY/06 VIRTUAL (OYWVH159) VMWare VMWare For D +esktop Not Defined (1 x 3064) 767 20473 7800 07/12/2004 10:50:32 N/A + N/A N/A N/A 5 adrianst Europe Local ER_LN_WAR Different Owner For Target Machi +ne CW041698 OYWVM1263 LN-CW/04 Research XP DESKTOP Compaq Evo D510 +(07E8h) Small Form Factor Pentium 4 (1 x 2259) 511 38154 10740 N/A N +/A OYWVM1263 Technology LN-OY/02 VIRTUAL (OYWVH138) VMWare VMWare For D +esktop Not Defined (1 x 3065) 767 20473 7788 06/12/2004 18:24:34 N/A + N/A N/A N/A 6 adrianst Europe Local ER_LN_WAR Different Owner For Target Machi +ne CW041698 OYWVM1230 LN-CW/04 Research XP DESKTOP Compaq Evo D510 +(07E8h) Small Form Factor Pentium 4 (1 x 2259) 511 38154 10740 N/A N +/A OYWVM1230 Technology LN-OY/06 VIRTUAL (OYWVH133) VMWare VMWare For D +esktop Not Defined (1 x 3065) 767 20473 6921 06/12/2004 17:48:37 N/A + N/A N/A N/A
From that data, I need to grab it in a record form. Each record starts with a record number 1,2,3,...50000..etc

I am not sure on how to do it, so I started with this code;
#! c:/perl/bin/perl.exe -slw $|++; use strict; use vars qw/%data/; open (LST, "$ARGV[0]") or die "\n$0 Error => $^E\n"; chomp (my @unclean = <LST>); print"size : $#unclean"; for (@unclean) { if ($_ =~ /^\d+\s+/) { print "First 1 : $_\n"; #print "Line 2 #print "Line 3 #print "Line 4 print "____________________________________\n"; } }
It does grab the first line of that data, but I am not sure on how to code the regex so that it grabs all lines until the next record number, where start of a new record begins.

I have spent some time trying all sorts of different combinations of regex to no avail. I would appreciate if any of you divine beings can inspire and guide me through this.

Thanks

UPDATE : The 'n/a's are empty field values reserved for dates. its not always that records end with n/a.
Blackadder

In reply to Capturing Multiple lined data with regex. by blackadder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.