G'day Misstre,
Welcome to the Monastery.
I see your own tentative solution, and all that follow, use regexes. Perl's string handling functions are typically faster than regexes. Depending on how many "thousands of these mistakes" there are, this might make a difference. Here's a solution that doesn't use any regexes.
#!/usr/bin/env perl use strict; use warnings; use autodie; use File::Copy; my $ref_file = 'ref.txt'; my $full_file = 'full.txt'; my $bu_file = "$full_file.BU"; #--------------------------------------------- # TODO - for demo only; remove for production copy('original_full.txt', $full_file); #--------------------------------------------- copy($full_file, $bu_file); my %ref_paths; _get_ref_paths($ref_file, \%ref_paths); { open my $ifh, '<', $bu_file; open my $ofh, '>', $full_file; while (<$ifh>) { chomp; my $cmd = substr $_, 5, -1; my @possibles = @{_assess_full_path($cmd, \%ref_paths)}; if (@possibles == 1) { $ofh->print(qq{CMD="$possibles[0]"\n}); } elsif (@possibles > 1) { $ofh->print(qq{QRY($.)="$_"\n}) for @possibles; } else { $ofh->print(qq{WTF($.)="$cmd"\n}); } } } #--------------------------------------------- # TODO - for demo only; remove for production print "\n*** ref file: '$ref_file'\n"; system cat => $ref_file; print "\n*** bu file: '$bu_file'\n"; system cat => $bu_file; print "\n*** full file: '$full_file'\n"; system cat => $full_file; #--------------------------------------------- sub _assess_full_path { my ($cmd, $ref_paths) = @_; my $possibles = []; my $pos = 1 + rindex $cmd, '/'; my $start = substr $cmd, 0, $pos; my $end = substr $cmd, $pos; my $max = substr $cmd, 0, rindex($cmd, '.') - 1; if (exists $ref_paths->{$start}) { for my $key (keys %{$ref_paths->{$start}}) { my $dir = "$start$key"; if (0 == index $max, $dir) { my $full_path = join '/', $dir, substr $cmd, length $dir; $full_path =~ y{/}{/}s; push @$possibles, $full_path; } } } return $possibles; } sub _get_ref_paths { my ($ref_file, $ref_paths) = @_; open my $fh, '<', $ref_file; while (<$fh>) { chomp; my $end = substr $_, rindex($_, '/') + 1; substr $_, rindex($_, '/') + 1, length($_), ''; $ref_paths->{$_}{$end} = 1; $ref_paths->{"$_$end/"}{''} = 1; } return; }
I dummied up some files to test this. Here's a sample run's output:
*** ref file: 'ref.txt' /a /a/b /a/b/c /b /b/c /c /ab /abc /abcd *** bu file: 'full.txt.BU' CMD="/a/a.sh" CMD="/aa.sh" CMD="/ab.sh" CMD="/abc.sh" CMD="/a/bc.sh" CMD="/a/b/c.sh" CMD="/a/b/c/.sh" CMD="/a/b/cd.sh" CMD="/a/b/c/d.sh" CMD="/x/y.z" CMD="/a/xyz.sh" CMD="/abcd.sh" CMD="/a/very 'special' command.exe" *** full file: 'full.txt' CMD="/a/a.sh" CMD="/a/a.sh" CMD="/a/b.sh" QRY(4)="/a/bc.sh" QRY(4)="/ab/c.sh" QRY(5)="/a/b/c.sh" QRY(5)="/a/bc.sh" CMD="/a/b/c.sh" WTF(7)="/a/b/c/.sh" QRY(8)="/a/b/cd.sh" QRY(8)="/a/b/c/d.sh" CMD="/a/b/c/d.sh" WTF(10)="/x/y.z" CMD="/a/xyz.sh" QRY(12)="/a/bcd.sh" QRY(12)="/abc/d.sh" QRY(12)="/ab/cd.sh" CMD="/a/very 'special' command.exe"
Notes:
— Ken
In reply to Re: In place replacement from reference list
by kcott
in thread In place replacement from reference list
by Misstre
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |