Remove lines from a list of files based on those in a given file

Paws_of_Iron has asked for the wisdom of the Perl Monks concerning the following question:

Hi all. I'm trying to remove lines from a file based on the lines given in another file. My code actually works but stops with an error

Can't find Unicode property definition "R" at c:/remove.pl line 23, <CURRENTDAT> line 8.

Line 8 is the last line in the file test.dat (this error changes if I add lines it is always the last line). Line 23 is the pattern match part of the code. If I print the lines as it goes it is less than half way through Bad_Shares.TXT and the line changes as I add lines to test.dat (running out of memory?). This is my early test script before I get it to go through a list of files in a given directory (lots of DATs containing drives to map for specific users).

Any help appreciated. I also tried VBS got close but what a nightmare!


# Open the file containing the matches to remove
open(BADSHARES, "c:/testy/Bad_Shares.txt");
@BadShares = <BADSHARES>;

# Open the file/s to remove these matches from
open(CURRENTDAT, "c:/testy/test.dat");
@DatLines = <CURRENTDAT>;    

# For each line in the file to match from

for ($a = 0; $a < scalar(@DatLines); $a++)
{
    # Go through each line of the file to remove these matches from

    for ($i = 0; $i < scalar(@BadShares); $i++)
    {

    # If the current line in the DAT matches any line in the file to m
+atch set that line to zero


#    print "$DatLines[$a]       $BadShares[$i]";
    if ($DatLines[$a] =~ m/($BadShares[$i])/i)    

            {    
            print "MATCH  $DatLines[$a]       $BadShares[$i]";
            # $DatLines[$a]=""    
            }

    }
}
[download]

Comment on Remove lines from a list of files based on those in a given file Download Code

Replies are listed 'Best First'.
Re: Remove lines from a list of files based on those in a given file by GrandFather (Saint) on Jan 30, 2006 at 23:32 UTC
Most likely it is that you need to quote the BadShares interpolated string: `m/(\Q$BadShares[$i]\E)/i`. However, the code snippet you have given is not ameniable to us reproducing your error. Try plugging suitable data into: use strict; use warnings; open tempFile, '>', "test.dat"; print tempFile <<TEMP; it is always the last line). Line 23 is the pattern match part of the +code. If is my early test script before I get it to go through a list of files +in a TEMP close tempFile; my @BadShares = <DATA>; # Open the file/s to remove these matches from open CURRENTDAT, '<', "test.dat"; my @DatLines = <CURRENTDAT>; # For each line in the file to match from for ($a = 0; $a < scalar(@DatLines); $a++) { # Go through each line of the file to remove these matches from for (my $i = 0; $i < scalar(@BadShares); $i++) { # If the current line in the DAT matches any line in the file +to match set # that line to zero if ($DatLines[$a] =~ m/(\Q$BadShares[$i]\E)/i) { print "MATCH $DatLines[$a] $BadShares[$i]"; } } } __DATA__ Line 8 is the last line in the file test.dat (this error changes if I +add lines it is always the last line). Line 23 is the pattern match part of the +code. If I print the lines as it goes it is less than half way through Bad_Shar +es.TXT and the line changes as I add lines to test.dat (running out of memory +?). This is my early test script before I get it to go through a list of files +in a given directory (lots of DATs containing drives to map for specific us +ers). [download] Prints: `MATCH it is always the last line). Line 23 is the pattern match part +of the code. If it is always the last line). Line 23 is the pattern match part +of the code. If MATCH is my early test script before I get it to go through a list of + files in a is my early test script before I get it to go through a list of + files in a` [download] DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re^2: Remove lines from a list of files based on those in a given file by Paws_of_Iron (Initiate) on Jan 31, 2006 at 02:24 UTC
SHORT STORY ALL FIXED THANKS Sorry for horrible formatting but Perl Monks makes posting a pain (all + those tages I cant remember hidden in the site so finding how to pos +t takes 10 minutes I'll never get back each time). Thanks for taking a look. Here is what I've found. The file that con +tains the matches has loads of drives like this \\HOFIL01\CALDWELLS$ \\GEMINI1\CAMBRIDGE$ \\GEMINI1\ADMIN \\GEMINI2\CONFERENCE \\HOFIL02\PROJECT \\HOIFL02\CURRFUNC \\HOFIL01\CAPELLAC$ \\HOFIL01\CARKEEKF$ When the script hits a line like \\HOFIL02\PROJECT it breaks. If I ch +ange it to \\HOFAL02\PROJECT or \\HOFIL02\PROJECT THIS or \\HOFIL02\P +Z (but no lower letter than Z) it will work (or stop on the next \\ho +fil02\p(a-y) (but if it ended in $ it was ok?! in horror at the oddness of this I simply removed all \,$ and spaces f +rom the variables before comparing them which worked fine. How bizza +r? What would cause only lines starting in \\hofil02\p(a-z)$ to caus +e an issue? Also for a start I just removed any space from the varia +bles. Then it would stop on half the lines starting in \\hofil02\p(a +-z)???$ (eg \\hofil01\pathis$). The Dat file that these patterns where being matched against looked li +ke this. [NATIONAL OFFICE] floor=2 h:=\\moent14\username$ i:=\\aknt1\operations The dats had no space on the end of the drive entries but the matching + file did. All very odd. My Perl was always bad and now it is rust +so here is my current full script (no laughing please it works nicely +). I'll mod it up to output the changed files (and eventually to rep +lace the existing ones after some more testing). # Once more into the Perl Dear Friends Bruce Taylor :: Datacom Co +nsulting # # Search all MOE DAT files in a given directory, remove any lines fou +nd in another given file use File::Find; # Built in perl function that finds files i +n directories (find - traverse a file tree) # http://www.perldoc.com/perl5.6.1/lib/File/Find.html $feeddir = @ARGV[0]; # @ARGV is the built in perl array that is + feed to the script from the command line $feeddir =~ s/\\/\//; # If the dir is given using \ (like c:\tem +p) then subsitute all \ for /'s if ($feeddir =~ "") { system("cls"); print "\n\n\n\nNo directory specified\n\n"; print "You need to specify a driectory to search down\n"; print "EG... SearchReplace.pl c:\\TEMP\DATS\n\n\n\n\n"; die; } chdir("$feeddir"); # Change working directory to the now co +rrected directory given on the command line find(\&wanted, "$feeddir"); # use the built in find command wit +h the wanted subroutine and send it the $feeddir variable sub wanted # the wanted subrouting is used to select o +nly .dat files(further) down and set the filename { # the { is used to start a code block /\.DAT$/i or return; # Nab only DATS or go to the n +ext file $filename = $File::Find::name; # set $filename variable to +full path\name ie c:\temp\dats\datfile.dat &Details($filename); # so we have the file and then fee +d that into the Details sub/sub routine } sub Details { # the Details sub $file = shift; # Open the file containing the matches to remove open(BADSHARES, "c:/testy/Bad_Shares.txt"); @BadShares = <BADSHARES>; close BADSHARES; # Open the file/s to remove these matches from open(CURRENTDAT, "$file"); @DatLines = <CURRENTDAT>; close CURRENTDAT; # For each line of the file containing the matches to remove for ($a = 0; $a < scalar(@DatLines); $a++) { # Go through each line of the file you want to remove the matches +from for ($i = 0; $i < scalar(@BadShares); $i++) { # Remove any \, $ or space from the line being worked on from each + file (seems to upset perl in unnatural ways) $BadShares[$i] =~ s/ //g; $BadShares[$i] =~ s/\\//g; $BadShares[$i] =~ s/\$//g; $DatLines[$a] =~ s/ //g; $DatLines[$a] =~ s/\\//g; $DatLines[$a] =~ s/\$//g; next if (!($DatLines[$a] =~ m/($BadShares[$i])/i)); chomp $DatLines[$a]; chomp $BadShares[$i]; print "$file $DatLines[$a] $a $BadShares[$i] + $i\n"; } } } [download]	[reply] [d/l]
Re^3: Remove lines from a list of files based on those in a given file by GrandFather (Saint) on Jan 31, 2006 at 02:36 UTC
The tags you complain of are HTML or PerlMonks special tags. Actually, there are only three tags that you really need to know - `<p>`, `<code>` and `<readmore>`. You can get away without closing paragraph tags (`<p>`), but you should really close code and readmore tags: `</code>` and `</readmore>`. When you preview a node you are posting, there is a link in the fine print to Writeup Formatting Tips. As the advise says: " If something looked unlike you expected it to you might need to check out Writeup Formatting Tips" Now you can go back and put those paragraph tags in that you missed :) DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re^4: Remove lines from a list of files based on those in a given file by Paws_of_Iron (Initiate) on Jan 31, 2006 at 03:47 UTC