Deleting paragraphs based on match in a hash

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

Wondering if I could get some assistance.

Deleting paragraphs based on match in a hash.

Where I'm lost is getting the 'key' (PORNUM: *key*) from the paragraph in $_.

I'm very new to Perl, just a few long days/nights ... whatever they are!

#!/usr/bin/perl

my @a, %hash;
my $file = shift;
open(list, "< $file") or die;
chomp( @a=<list> );
close(list);
@hash{@a}=@a;

$/ = "";

# Example data
#
# $ cat hash
#  PP22x43@.5
# $ cat data
#  \n
#  Random lines
#  PORNUM: PP22x43@.5
#  Random lines
#  \n
#  Random lines
#  PORNUM: PC12x120/25

while (<>) {

## manual test #  print if !m/PORNUM: PP22x43@.5/ms;

  print unless exists $hash{  }
}
exit(0);
[download]

Example data.

Example exclude file (No blank lines)

PORNUM: PP22x43@.5
PORNUM: PC12x120/25
[download]

Example data file (Records have a leading blank line, first line is blank)

random lines of data
PORNUM: PC21x21!2
random lines of data

random lines of data
random lines of data
PORNUM: PP22x43@.5
random lines of data

PORNUM: PP12x60@1
random lines of data
random lines of data

random lines of data
PORNUM: PC12x120/25
[download]

Thanks Gary

Comment on Deleting paragraphs based on match in a hash Select or Download Code

Replies are listed 'Best First'.
Re: Deleting paragraphs based on match in a hash by bart (Canon) on Sep 02, 2009 at 08:42 UTC
What I think you are trying to do is these steps: extract the key from the paragraph with a regex look up this key in your hash and respond to it So, what this means in code: `while(<>) { if(/^PORNUM:\s(\S+)/m) { if(exists $skip{$1}) { # It matches, and ought to be skipped next; } } print; }` [download] That doesn't look too bad, does it? (And it can be reduced a lot more, at the cost of readability for beginners. But IMHO, it's worth it.) All you still have to do, is initialize the %skip hash first, for example, like this: `my %skip = map { $_ => 1 } qw(PP22x43@.5 PC12x120/25);` [download] Alternatively, you can read data from a string or from the `DATA` section (or even from an external file), and split on whitespace. p.s. The reduced code that I talked about, can be something like this: `while(<>) { next if /^PORNUM:\s(\S+)/m and exists $skip{$1}; print; }` [download] Update: Oh, now I see: you put in the format for an "exclude" file. well, you'll have to read that first. `while(<EXCLUDE>) { if(/^PORNUM:\s*(\S+)/) { $skip{$1} = 1; } }` [download]	[reply] [d/l] [select]
Re: Deleting paragraphs based on match in a hash by spazm (Monk) on Sep 02, 2009 at 09:21 UTC
Your code is pretty close to working. Problems and changes: You have two definitions for the contents of your hash file. Will the hash file have "PORNUM: " prefixed on each line? turn on strictures: `use strict; use warnings`, they'll catch problems like your use of a possible reserved word 'list'. some clean-up: let's use the 3 argument form of open, with a lexically scoped filehandle. Get into this habit early. Let's skip blank lines in the hash file, to make it easier for your users to be correct. If you're going to modify global special variables, like `$/` you'll want to make a local copy with `local`. And put them in as small of a scope as possible, I've added a brace block around the local to make it clear to future readers of the code that this is the planned scope of the change to `$/` #!/usr/bin/perl use strict; use warnings; my %skip; my $file = shift; open(my $list, '<', $file) or die; while(<$list>) { chomp; next if /^$/; #skip blank lines in hash file #assumes that hash file is just the ID, without umPORN prefix $skip{ $_ }=1; } close($list); { local($/)=""; while (<DATA>) { next if ( m/ PORNUM: \s+ (.*) $ /mx && $skip{ $1 } ); print "-$_-"; } } __DATA__ random lines of data PORNUM: PC21x21!2 random lines of data random lines of data random lines of data PORNUM: PP22x43@.5 random lines of data PORNUM: PP12x60@1 random lines of data random lines of data random lines of data PORNUM: PC12x120/25 [download]	[reply] [d/l] [select]
Re^2: Deleting paragraphs based on match in a hash by Anonymous Monk on Sep 03, 2009 at 23:57 UTC
Sorry it's been so long getting back... Rough week! Wow... Love the responces! Pleasant! Informative! Educational! The way it should be. Thank you! Got it ... Understood it and I Thank you! Thank you! Thank you! Just to make it clear... PORNUM = Part Number = materialSIZExSIZE@PoreSize = Filtration materials! Long post ... Sorry! The important part is said! I was close, My regex is where _ I _ was broke ... My last attempt was "m/^PORNUM: .*/xsm", however I was all over the place. bart's regex did not allow spaces in the key Value. So the final regex is a slightly modified version of spazm's. Following are two methods of my end results for the next in need! The better one is ... ??? Method 1, based on my original code. Read more... (681 Bytes) Method 2, Based on spazm's example. Read more... (721 Bytes) Example excludefile & datafile Read more... (714 Bytes) And my newly aquired understanding of hash'es allowed me to do the following. This was run on a data file of ~26M with 80,743 records and took less than 15 seconds to weed out 20,271 obsolete records leaving 60,472 ... !!! Sorry for the spill, I just think that is incredible speed! Especially on a FBSD7.1R PIII-866! It took over 3 hours to print it out of the existing system! A newer system running M$lop Explode! (think it's a Pentium dual core) YES ... I was impressed! And for that I thank you all again and again! My purge tool! Read more... (640 Bytes) Thanks!	[reply] [d/l] [select]