Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I'm pretty much a n00b when it comes to perl, so forgive me if my question is, well, a stupid one.

I have written a program that takes a list of words and values as an argument plus a list of files. I have put the wordlist in a hash and the file list in an array. The files in the file list are documents that contain the same words as the wordlist, but not all words are in every document.
If the document does contain a certain word, then all occurences need to be replaced with "ECHO", except for the first occurence. I have checked all variables with print statements and it seems that everything is going okay, but my .out files are exactly the same as the input files. So it seems something is wrong with the way I'm printing the substitutions to the .out file.
This is my program:
#! /usr/bin/perl -w $echo="ECHO"; open (DFFILE,$ARGV[0]) || die "DF-file not found\n"; open (LIST,$ARGV[1]) || die "List not found\n"; while (<DFFILE>) { ($value, $key) = split(/\t/, $_); $lijst{$key} = $value; } @listfiles = <LIST>; #print @listfiles; foreach $key (sort keys %lijst) { # print "The value associated with key $key is $lijst{$key}\n"; if ($lijst{$key}==1) { my $word = $key; for $file (@listfiles) { open (FILE,"$file"); open (OUT,">$file.out"); while ($_=<FILE>) { if ($_ =~ m/^$word$/) { $_ =~ s/$word/$echo/g; $_ =~ s/$echo/$word/; } print OUT $_; } close(FILE); close(OUT); } } } close(DFFILE);
Any help would be greatly appreciated!

Matje

Replies are listed 'Best First'.
Re: problem with string substitution output
by moritz (Cardinal) on May 23, 2008 at 15:21 UTC
    You iterate over all words, and in each iteration you open the original file, change something, and print the altered output to another file.

    This means that the output file is overwritten with what you print in the last iteration.

    So you have to change the nesting of the loops. You can get around using the loop over the words altogether with a little trick:

    my $all_words_regex = join '|', keys %lijst; for my $file (@listfiles) { open my $in, '<', $file or die "Can't open file '$file' for reading +: $!"; open my $out, '>', "$file.out" or die "Can't open file '$file.out' +for writing: $!"; while (<$in>){ if (m/^($all_words_regex)/){ my $first_word = $1; s/$first_word/$echo/g; s/$echo/$first_word/; } print $out $_; } }

    BTW all of your programs should start like this:

    use strict; use warnings;

    And you should declare all your variables with either my (most of the time) or our.

      Hi Moritz,

      Thanks for your reply. However, I still can't get it to work.

      I have defined all variables, I use strict and warnings. I have changed the lines for opening the in and out files to
      open (my $in,"<$file") or die "Can't open file '$file' for reading: + $!"; open (my $out,">$file.out") or die "Can't open file '$file.out' for + writing: $!";
      because otherwise it gave me this error:
      Unsuccessful open on filename containing newline at test.pl line 18

      With your alterations my program looks like this:
      use strict; use warnings; my $echo="ECHO"; my $value=undef; my $key=undef; my %lijst=(); open (DFFILE,$ARGV[0]) || die "DF-file not found\n"; open (LIST,$ARGV[1]) || die "List not found\n"; while (<DFFILE>) { ($value, $key) = split(/\t/, $_); $lijst{$key} = $value; } my @listfiles = <LIST>; my $all_words_regex = join '|', keys %lijst; for my $file (@listfiles) { open (my $in,"<$file") or die "Can't open file '$file' for reading: + $!"; open (my $out,">$file.out") or die "Can't open file '$file.out' for + writing: $!"; while (<$in>){ if (m/^($all_words_regex)/){ my $first_word = $1; s/$first_word/$echo/g; s/$echo/$first_word/; } print $out $_; } } close(DFFILE);
      But this gives me the exact same result as it did without your alterations. Or am I doing something wrong?

      Also, I realize I left out one aspect of my program. The substitution only needs to be done when the value of the hash key is 1.

      Matje
        Unsuccessful open on filename containing newline at test.pl line 18
        This tells you that have to remove the newlines from the lines you are reading from your file list:
        while (<DFFILE>) { chomp; # remove newline at the end. ($value, $key) = split(/\t/, $_); $lijst{$key} = $value; }
Re: problem with string substitution output
by mwah (Hermit) on May 23, 2008 at 16:18 UTC

    There are some unclear parts in your code, we don't know how your data looks like (so there may be any kind of problem).

    You should alwyas open one file after another and do all the work in one file at once. I reorganized your code (out of the blue, don't know your data) - maybe thats a new starting point.

    ... my $echo = 'ECHO'; my ($dffname, $listname) = @ARGV; my $fh; # read key/val lists open $fh, '<', $dffname or die "$dffname $!"; my %lijst = map { chomp; (split /\t|\s{2,}/)[1,0] } <$fh>; close $fh; # readfile list open $fh, '<', $listname or die "$listname $!"; my @listfiles = <$fh>; chomp @listfiles; close $fh; for my $file (@listfiles) { # read original file open $fh, '<', $file or die "$file $!"; local $/; my $content = <$fh>; close $fh; # modify content while( my ($key, $val) = each %lijst ) { next unless $val == 1; $content =~ s/^$key$/$echo/gms; $content =~ s/^$key$/$echo/; # re-subsitute first (?) } # write modified file open $fh, '>', "$file.out" or die "$file.out $!"; print $fh $content } ...

    Regards

    mwa

      Thanks for your reply!

      This looks more like the way it needs to be, except now everything gets replaced with "ECHO"... At least it's progress ;)
      My data file (which I read into the hash) is a document frequency list, so each line has a number, a tab and a word, like this:

      1<tab>word

      The other file is just a list of filenames.
      I'm gonna toy around with it some more, but if you have any more suggestions they are very welcome.

      Matje
        Okay, I almost got it to work like this:
        use strict; use warnings; my $echo = "ECHO"; my ($dffname, $listname) = @ARGV; my $fh; my $key=undef; # read key/val lists open $fh, '<', $dffname or die "$dffname $!"; my %lijst = map { chomp; (split /\t/)[1,0] } <$fh>; close $fh; foreach $key (sort keys %lijst) { print "The value associated with key $key is $lijst{$key}\n";} # readfile list open $fh, '<', $listname or die "$listname $!"; my @listfiles = <$fh>; chomp @listfiles; close $fh; for my $file (@listfiles) { # read original file open $fh, '<', $file or die "$file $!"; local $/; my $content = <$fh>; close $fh; # modify content while( my ($key, $val) = each %lijst ) { next unless $val == 1; $content =~ s/^$key$/$echo/gms; $content =~ s/$echo/$key/; } # write modified file open $fh, '>', "$file.out" or die "$file.out $!"; print $fh $content }
        BUT... in this piece of code:
        $content =~ s/^$key$/$echo/gms; $content =~ s/$echo/$key/;
        the value of the second $key isn't the same as the value of the first $key. The goal of this piece is to first substitute every occurence of $key with $echo, and then to substitute only the first occurence of $echo with $key (thus returning it to it's original value). And that's not working. :( Matje
Re: problem with string substitution output
by jwkrahn (Abbot) on May 23, 2008 at 20:26 UTC

    It should work like this:

    #!/usr/bin/perl use warnings; use strict; my $echo = 'ECHO'; @ARGV == 2 or die "usage: $0 file1 file2\n"; open DFFILE, '<', $ARGV[ 0 ] or die "$ARGV[ 0 ]: $!"; my @lijst = sort map /^1\t([^\t\n]+)/, <DFFILE>; close DFFILE; open LIST, '<', $ARGV[ 1 ] or die "$ARGV[ 1 ]: $!"; chomp( my @listfiles = <LIST> ); close LIST; for my $word ( @lijst ) { for my $file ( @listfiles ) { open FILE, '<', $file or die "$file: $!"; open OUT, '>', "$file.out" or die "$file.out: $!"; reset; while ( <FILE> ) { ?^$word$? or s/$word/$echo/g; print OUT; } } }