Re: In place search and replace with a hash
by eyepopslikeamosquito (Archbishop) on Dec 28, 2014 at 06:21 UTC
|
I don't like your code to populate your hash, namely:
my ($key, $value) = split /\s/, $line;
next LINE if not $key;
$hash{$key} = $value;
chomp (%hash);
Though the chomp(%hash) kinda "works", the doco for chomp
does not mention applying it to a hash and accordingly, I feel this code
is unclear.
Update: Whoops, the chomp doco does indeed
mention applying chomp to a hash, as pointed out
below.
Thanks AnomalousMonk.
I would do it something like this:
use strict;
use warnings;
use Data::Dumper;
my %hash;
open my $fh, '<', 'hash.txt' or die "Can't open file $!";
while (my $line = <$fh>)
{
chomp $line; # remove trailing newline
$line =~ s/^\s+//; # optional: remove leading whitespa
+ce
$line =~ s/\s+$//; # optional: remove trailing whitesp
+ace
next unless length($line); # ignore blank lines
next if $line =~ /^#/; # optional: to allow comment lines
# Format: key whitespace value (value can contain whitespace)
my ($key, $value) = split ' ', $line, 2;
defined($value) or die "error: no value for '$key'";
$hash{$key} = $value;
}
close $fh;
print Dumper(\%hash);
You might further want to consider what will happen if your key
contains regex metachars (e.g. abc.xyz).
Maybe this can't happen for your data ... but if it does,
your regex may be incorrect (could use /\Q$key\E/
to escape metachars, if required by your data).
Re safely editing a file in place, see:
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
Re: In place search and replace with a hash
by LanX (Saint) on Dec 28, 2014 at 01:54 UTC
|
Looks like you are trying to use $1 , $2 from one regex within another one, which of course resets them again to undef.
Copy them to new variables before using them.
edit
Not sure why you even need two nested regexes here... One s/// should be sufficient.
Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
| [reply] |
|
|
while (<FH>)
{
if (/(\S+):(\S+).*\n/)
{
$var1 = "$1";
$var2 = "$2";
s/$var1/$hash{$var1}_$var/g;
}
}
and now I get a different Use of uninitialized value error, which seems better?: "Use of uninitialized value within %hash in concatenation (.) or string at test_perl.pl line 41" I don't understand your comment about nested regexes, could you elaborate? Thanks in advance for any help | [reply] [d/l] |
Re: In place search and replace with a hash
by AnomalousMonk (Archbishop) on Dec 28, 2014 at 02:44 UTC
|
Some other thoughts on the OPed code.
my ($key, $value) = split /\s/, $line;
next LINE if not $key;
split returns its input string ($line in this case) if it cannot split according to the given pattern, so $key will almost always have a true value no matter what happens. Better to use
my ($k, $v) = split /\s/, $line;
next LINE if not defined $v;
I use defined to avoid false negatives from '0' and '' (the empty string), both of which evaluate as boolean false. I also use the variable names $k $v because I don't like using variaable names that are the same as Perl built-in functions, etc.
$hash{$key} = $value;
The hash %hash is presumably a package global. Better to use a pre-declared my lexical. (Better in general to avoid globals.)
chomp (%hash);
The chomp is done on all the values of the hash on every iteration through the while-loop. This does no harm, but needlessly burns cycles. chomp can be done just once on the hash after it is populated, i.e., after the while-loop.
Update: This code doesn't seem to really do anything:
@file_array =<*.txt> or die $!;
foreach $file (@file_array)
{
open FH, "$file", or die $!;
while (<FH>)
{
if (/(\S+):(\S+).*\n/)
{
s/$1/$hash{'$1'}_$2/g;
}
}
close FH;
}
To be sure, it opens a bunch of files and reads and potentially alters each line of each file... but then what? The (potentially) altered line is never written out to any file. Is this some sort of tied filehandle (see, e.g., Tie::File)? There's no indication of this if so. Also, I don't understand what you mean by "... I want to change the files in place." As I understand the phrase, this can be done only if the replacement strings are guaranteed to be exactly the same length as the sub-strings that are replaced. Again, no indication this is the case.
Further Update: The process of "editing a file" (that has not been read into a tied array) on a line-by-line basis generally goes something like this (every step is assumed to be checked for success, i.e., it produces no error code):
-
open file to be edited (input file) in read mode ('<').
-
Generate temporary name of output file (usually based on input file name) and open for write ('>')
-
Until all lines are read from input file, read input file line-by-line, alter line as appropriate, write line to output file.
-
Close input file. Close output file.
-
Do whatever is necessary to be absolutely sure the newly-written output file is intact.
-
Change name of input file to some backup name, e.g., 'input.txt.bak'.
-
If the input file name change produces no errors, rename output file to original name of input file. If the output file rename is successful, the "backup" input file may be deleted; often, this file is kept in existence "just in case."
Even if it looks a lot simpler than this, this is more or less what is happening "behind the scenes." Can you present some code that resembles this process?
Give a man a fish: <%-(-(-(-<
| [reply] [d/l] [select] |
|
|
Anomalous, you're right. As I wrote in my original post, I have never actually used s/foo/bar other than in a one liner. I was looking for a way to find and replace strings in a file without writing to a new output file and when the replacement strings are not the same length as the sub-strings that are replaced. Clearly I am not doing this the correct way, and if there is a correct way, I haven't been able to find/understand it yet. What I want is to open a file and if a line is:
'nc7-52:p39|read1|1|19|296|0.0642'
and change it to:
C_fraterna_nc7-52
using a hash where $hash{nc7-52}=C_fraterna_nc7-52
Update: From this example it looks like it would be easy to print to output, but some lines in the file will not be modified and I still want to preserve them.
sorry, this thread may be a bit out of control since I had too many problems to begin with
| [reply] |
|
|
From this example it looks like it would be easy to print to output ...
As a first step, create a small test input file and just print the altered and unaltered lines to standard output, e.g.,
print $my_new_line_what_is_just_the_way_i_want_it;
When this limited, test output looks ok, start to think about the process of opening a new file and writing to that file instead of STDOUT. When that looks ok, start thinking about all the input/output file verification and renaming that has to happen. (I'm sure there are modules to facilitate the "edit in place" process, but I don't know what they are off-hand.)
Give a man a fish: <%-(-(-(-<
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] |
|
|
I think I understand your editing a file instructions, and if so I was really over-complicating things trying to do an "in place" replacement. Now I am trying something like this:
open FH, "<infile.txt" or die $!;
open OUT, ">outfile.txt";
while (<FH>)
{
if (/(\S+):(\S+).*\n/)
{
$var1 = "$1";
$var2 = "$2";
print OUT "$var1_$var2\n"
}
elsif (/(.*)\n/)
{
print OUT "$1\n";
}
}
close FH;
This does what I want in terms of preserving the input files non matching lines, but I am still getting a "use of uninitialized value" error, one for every line in the input file:
Use of uninitialized value $var1_ in concatenation (.) or string at test_perl.pl line 42 | [reply] [d/l] |
|
|
print OUT "$var1_$var2\n"
The '_' (underscore) character is a valid identifier, i.e., variable name, symbol. Therefore, the two variables you're using are $var1_ (note trailing _), which is not defined, and $var2, which is. Had you been using strictures as well as warnings, Perl would not have allowed this little oversight (see strict and warnings). The proper way to write this statement (if you really need the _) is
print OUT "${var1}_$var2\n";
(Note that I've added a ; at the end; the absence of this statement terminator caused no syntactic problem in the given code, but would have eventually — trust me.)
open FH, "<infile.txt" or die $!;
open OUT, ">outfile.txt";
Update: You seem to be back-sliding. In contrast to your OPed code, this code uses global filehandles, two-parameter open, and does not check the status of the output file open. Tsk, tsk! IMHO, the OPed code used better practices.
Give a man a fish: <%-(-(-(-<
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
if (/(\S+):(\S+).*\n/)
{
$var1 = "$1";
$var2 = "$2";
print OUT "$var1_$var2\n"
}
elsif (/(.*)\n/)
{
print OUT "$1\n";
}
}
Ok, once you get that working (i.e., fix $var1_ problem), try reducing the big if-statement to a single s/// substitution (untested):
s{ (\S) : (\S) }{${1}_$2}xmsg;
(which just replaces every : with an _), followed by a print of the (possibly altered) line (held in $_ in this code) to the output filehandle.
Update: This reply may have been superseded by your intervening post, but what the heck...
Give a man a fish: <%-(-(-(-<
| [reply] [d/l] [select] |
Re: In place search and replace with a hash
by Anonymous Monk on Dec 28, 2014 at 02:22 UTC
|
In addition to what LanX said, single quotes in $hash{'$1'} prevent variable interpolation, so you are searcing for key that is a literal string '$1'. | [reply] [d/l] |
|
|
Did you test it?
I can't at the moment°, but imho the first step on RHS is string like var interpolation. There is no eval flag set.
Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
°) mobile in a disco, ain't that I'll? ;)
| [reply] |
|
|
c:\@Work\Perl\monks>perl -wMstrict -le
"my %hash = ('$1' => 'oops', 'abc' => 'ok');
;;
'abc' =~ m{ (abc) }xms;
print qq{captured '$1'};
;;
print $hash{'$1'};
"
captured 'abc'
oops
Give a man a fish: <%-(-(-(-<
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
|
|
|
use 5.020;
use warnings;
my %h = ( foo => 1, bar => 2 );
$_ = 'foobar';
s/(foo)/$h{'$1'}/;
say $_;
$_ = 'foobar';
s/(foo)/$h{$1}/;
say $_;
my $foo = 'foo';
say "$h{'$foo'}bar";
output:
Use of uninitialized value $h{"$1"} in substitution iterator at extr.p
+l line 7.
bar
1bar
Use of uninitialized value $h{"$foo"} in concatenation (.) or string a
+t extr.pl line 15.
bar
| [reply] [d/l] [select] |
|
|
|
|
Oops, I meant to respond "thanks for catching that" Anonymous Monk but I got the usernames mixed up. Well thanks!
| [reply] |
|
|
| [reply] |