comment on

Dear Monks

In my attempts in trying to improve my understanding of Regex, I have the following data from a HTML dump;

1  adriaanf Europe Local    _Default Different Owner For Target Machin
+e HA050069  OYWVM1237  LN-CS/06 Technology XP DESKTOP Dell OptiPlex G
+X270 (0151) Pentium 4 (1 x 2793) 1023 38146 11147 N/A  N/A
 OYWVM1237  Technology LN-OY/02 VIRTUAL (OYWVH161) VMWare VMWare For D
+esktop Not Defined (1 x 3065) 767 20473 8209 23/11/2004 11:10:02  N/A
+  N/A
 N/A N/A 
2  adriaanf Europe Local    _Default Different Owner For Target Machin
+e HA050069  OYWVM1262  LN-CS/06 Technology XP DESKTOP Dell OptiPlex G
+X270 (0151) Pentium 4 (1 x 2793) 1023 38146 11147 N/A  N/A
 OYWVM1262  Technology LN-OY/06 VIRTUAL (OYWVH159) VMWare VMWare For D
+esktop Not Defined (1 x 3064) 767 20473 7800 07/12/2004 10:50:32  N/A
+  N/A
 N/A N/A 
5  adrianst Europe Local    ER_LN_WAR Different Owner For Target Machi
+ne CW041698  OYWVM1263  LN-CW/04 Research XP DESKTOP Compaq Evo D510 
+(07E8h) Small Form Factor Pentium 4 (1 x 2259) 511 38154 10740 N/A  N
+/A
 OYWVM1263  Technology LN-OY/02 VIRTUAL (OYWVH138) VMWare VMWare For D
+esktop Not Defined (1 x 3065) 767 20473 7788 06/12/2004 18:24:34  N/A
+  N/A
 N/A N/A 
6  adrianst Europe Local    ER_LN_WAR Different Owner For Target Machi
+ne CW041698  OYWVM1230  LN-CW/04 Research XP DESKTOP Compaq Evo D510 
+(07E8h) Small Form Factor Pentium 4 (1 x 2259) 511 38154 10740 N/A  N
+/A
 OYWVM1230  Technology LN-OY/06 VIRTUAL (OYWVH133) VMWare VMWare For D
+esktop Not Defined (1 x 3065) 767 20473 6921 06/12/2004 17:48:37  N/A
+  N/A
 N/A N/A
[download]

From that data, I need to grab it in a record form. Each record starts with a record number 1,2,3,...50000..etc

I am not sure on how to do it, so I started with this code;

#! c:/perl/bin/perl.exe -slw
$|++;
use strict;
use vars qw/%data/;

open (LST, "$ARGV[0]") or die "\n$0 Error => $^E\n";
chomp (my @unclean = <LST>);
print"size : $#unclean";
for (@unclean)
{
    if ($_ =~ /^\d+\s+/)
    {
        print "First 1 : $_\n";
        #print "Line 2
        #print "Line 3
        #print "Line 4
        print "____________________________________\n";
    }    
}
[download]

It does grab the first line of that data, but I am not sure on how to code the regex so that it grabs all lines until the next record number, where start of a new record begins.

I have spent some time trying all sorts of different combinations of regex to no avail. I would appreciate if any of you divine beings can inspire and guide me through this.

Thanks

UPDATE : The 'n/a's are empty field values reserved for dates. its not always that records end with n/a.

Blackadder

In reply to Capturing Multiple lined data with regex. by blackadder

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.