Regular Expressions

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

First I would like to say Thanks for the help that I have received so far. I have 2 question on how to use Regular Expressions. Here is an example of the data that I am dealing with. Header Line One ***-*** 0 0 ***-MBO 0 0 2TO-T/V 0 0 2TO-T/O 0 0 POC-CNU 1285 0 POC-A/M 0 15567 Header Line Two ***-*** 0 0 ***-MBO 0 0 2TO-T/V 0 0 2TO-T/O 0 0 POC-CNU 1285 0 POC-A/M 0 15567 1) I am looking for a way to read a line in and look for the first 7 characters. The Characters can start with a a-z 0-9 or */ and the fourth character will always be a -. It does this until it reads the line that says "Header line two" which starts the next file. 2) Also how would I do an error check for duplication. What I mean is if for some reason "Header Line two" is skipped over. And the next line is the same as in "Header line one" than I would need to produce an error. how would I go about doing this.

Comment on Regular Expressions

Replies are listed 'Best First'.
Re: Regular Expressions by ZZamboni (Curate) on Aug 14, 2000 at 20:00 UTC
Please use <code> tags around your data and code so that it's properly formatted. Here's the sample data that you posted (I removed empty lines for space): `Header Line One *-* 0 0 *-MBO 0 0 2TO-T/V 0 0 2TO-T/O 0 0 POC-CNU 1285 0 POC-A/M 0 15567 Header Line Two - 0 0 -MBO 0 0 2TO-T/V 0 0 2TO-T/O 0 0 POC-CNU 1285 0 POC-A/M 0 15567` [download] Here's sample code (untested) that does what you explained, storing each line in a hash using the first 7 characters as the key, and checking for duplicates: my $data={}; my $file; while(<>) { chomp; # Skip blank lines next if /^\s$/; if (/^Header/) { $data->{$_}={}; # Create a new first-level hash. $file=$_; next; } if (/^([a-zA-Z0-9/]{3}-\S{3})\s+/) { my $key=$1; if ($file) { # Check for duplicates. if (exists($data->{$file}->{$key})) { warn "Duplicate key $key in $file: $_\n"; next; } $data->{$file}->{$key}=$_; } else { warn "Line found before a header line: $_\n"; } } else { # Reject improper lines warn "Badly formatted line found, ignoring: $_\n"; } } [download] This stores the data in a structure like this: `$data->{Header Line One}-> {-} -> "-* 0 0 ..etc" {2TO-T/V} -> "..." ... ->{Header Line Two}-> ....` [download] This may not be precisely what you want, but it should give you an idea of one way of doing it. --ZZamboni	[reply] [d/l] [select]
Re: Regular Expressions by Shendal (Hermit) on Aug 14, 2000 at 20:06 UTC
First, surround any data or code with CODE tags. If I understand you correctly, your data looks something like this: `Header Line One *-* 0 0 *-MBO 0 0 2TO-T/V 0 0 2TO-T/O 0 0 POC-CNU 1285 0 POC-A/M + 0 15567 Header Line Two - 0 0 -MBO 0 0 2TO-T/V 0 0 2TO-T/O 0 0 POC-CNU 1285 0 POC-A/M + 0 15567` [download] Although you do not specify, I'll also assume that the header lines alternate between 'one' and 'two', and you just want to make sure that these don't repeat (that is, they continue to alternate). Here's what I'd try: `#!/usr/bin/perl -w use strict; # variable to hold the previous header my($header); foreach (<>) { if (/^Header Line (\S+)$/) { die "Error - header repeated in line $.\n" if ($header && $header +eq $1); $header = $1; next; } if (/^[\w\\/]{3}-[\w\\/]{3}/) { # do whatever processing on the line print "Got a matching line...\n"; } }` [download] Hope that helps, Shendal Update:* Darn. Looks like I got your data wrong - or zzamboni did (grin). All the more reason to use CODE tags.	[reply] [d/l] [select]
RE: Re: Regular Expressions by Adam (Vicar) on Aug 14, 2000 at 21:09 UTC
You can see the intended format of a post by viewing the html source that your browser is attempting to display.	[reply]
RE: Regular Expressions by Anonymous Monk on Aug 15, 2000 at 17:31 UTC
Thanks Guys for the help and also the advice.. ZZamboni what you had was the correct format.	[reply]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks