Larry has asked for the wisdom of the Perl Monks concerning the following question:

I want to remove a line (which is called from a text file) from another file which may or may not contain the line. So far I have written this below but it only seems to read the last line of the nosect.txt file and does not do any of the other lines.
\#!/usr/local/bin/perl -w %Acts = (); use Getopt::Long; $time1= time; &Actsin1; &Group; $time2 = time; $time3 = $time2 - $time1; print "$time3 seconds\n"; print "finished"; sub Actsin1 { $filetoopen1 = "c:\\Update\\nosect\.txt"; open (ACTLIST, $filetoopen1) || die "Could not open file $filetoop +en1 \n"; open (OUTPUT2,">c:\\Update\\Errors.txt") || die "Could not open ou +tput \n"; dbmopen (%Groups, "c:\\Update\\Groups",0666) || die "Could not ope +n Acts database\n"; while (<ACTLIST>) { $TheLine = $_; @WordList = "$TheLine"; } } sub Group { $filetoopen = "c:\\update\\makeme\\statutes\.fff"; open (INPUT, $filetoopen) || die "Could not open file $filetoopen \n"; open (OUTPUT,">c:\\update\\makeme\\newstat.fff") || die "Could not ope +n output \n"; while (<INPUT>) { $TheLine = $_; if ($TheLine =~ /<RD>[^\n]*Status Compendium<<.JL>/i) { $ListPos = 0; until($ListPos > $#WordList) { if ($TheLine eq $WordList[$ListPos]) { $TheLine = ""; pop(@WordList) } $ListPos +=1 } } print OUTPUT "$TheLine"; } }

Replies are listed 'Best First'.
Re: Remove a line of text
by Trimbach (Curate) on Sep 21, 2001 at 02:54 UTC
    This is (a) problem with your code:
    while (<ACTLIST>) { $TheLine = $_; @WordList = "$TheLine"; }
    What this does is read in a line from ACTLIST, assign it to $TheLine, then assigns $TheLine to the @WordList array. However, each iteration causes @WordList to get overwritten with the new value of $TheLine, which is probably not what you want.

    Try this instead:

    while (<ACTLIST>) { push @WordList, $_; }
    or, even better:
    @WordList = (<ACTLIST>);
    ...which'll do the same thing.

    Gary Blackburn
    Trained Killer

Re: Remove a line of text
by dragonchild (Archbishop) on Sep 21, 2001 at 02:53 UTC
    While having not read the script completely, I would recommend the following steps for you to figure out what's going on:
    1. put 'use strict;' and 'use warnings;' at the top of your script. This will require you to declare all your variables using my or our. (Use my unless you know you need to use our. If you don't understand, just use my.)
    2. The following works just fine:
      while (my $theLine = <INPUT>) {
      And, it will even make sure that you kick out of the while loop when the file is empty.
    3. Use consistent block formatting. That will help you catch logic errors. For example, your second while-loop could look like:
      while (my $TheLine = <INPUT>) { if ($TheLine =~ /<RD>[^\n]*Status Compendium<<.JL>/i) { my $ListPos = 0; until($ListPos > $#WordList) { if ($TheLine eq $WordList[$ListPos]) { $TheLine = ""; pop(@WordList) } $ListPos +=1 } } print OUTPUT "$TheLine"; }
    Just a note - if you want to print something or NULL, depending on a condition, make that obvious.

    Also - read up on pop. I don't think it does what you think it does.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Re: Remove a line of text
by jryan (Vicar) on Sep 21, 2001 at 04:21 UTC

    A few more things:

    1. First off, after you use strict; like dragonchild recommended, you will notice that @WordList isn't scoped properly. Also, while you have the right idea towards good, modular code by using functions, your functions need to be... functionized. The first one, actsin1, doesn't even really need to be a function at all. In fact, if it wasn't a function, you would no longer have any scoping problems associated with @WordList. Your The second one, Group, should accept the array as an argument. Keeping a tight scope is a good programming practice.
    2. $ListPos +=1 should be $ListPos++. Its much more readable this way - the ++ operator is there for a reason, you know :)
    3. In your file paths, just use a single forward slash - perl will translate the slashes to the correct type based on the system.
    4. This block:
      if ($TheLine =~ /<RD>[^\n]*Status Compendium<<.JL>/i) { $ListPos = 0; until($ListPos > $#WordList) { if ($TheLine eq $WordList[$ListPos]) { $TheLine = ""; pop(@WordList) } $ListPos +=1 } } print OUTPUT "$TheLine";
      can be reduced to:
      if ($TheLine =~ /<RD>[^\n]*Status Compendium<<.JL>/i) { for(my $i=0; $i > @WordList; $i++) { print OUTPUT $TheLine if ($TheLine ne $WordList[$i]) } }

      This is much more easy to read and efficient, since you arent using @WordList after you are done with the program. Give it a whirl.
    5. Instead of explictly timing your program, consider using Benchmark. It is much more accurate

    Sorry for being nitpicky, but I felt the urge :) Good Luck!

      #4 can be reduced to a hash, which is shorter and more efficient yet. Besides that, I doubt the OP wants to print $TheLine for each word it doesn't match.
      Thanks for the help, when I made the changes and ran the script it hangs and returns a null file newstat.fff after about 5 mins. The Statutes.fff file is 400mb. What I am attempting to do is relpace words in the main file from a list of words in a text file (each word is seperated by a line break). I want the output to be a new file which contains all the old text minus the words not wanted.