in reply to In place replacement from reference list
G'day Misstre,
Welcome to the Monastery.
I see your own tentative solution, and all that follow, use regexes. Perl's string handling functions are typically faster than regexes. Depending on how many "thousands of these mistakes" there are, this might make a difference. Here's a solution that doesn't use any regexes.
#!/usr/bin/env perl use strict; use warnings; use autodie; use File::Copy; my $ref_file = 'ref.txt'; my $full_file = 'full.txt'; my $bu_file = "$full_file.BU"; #--------------------------------------------- # TODO - for demo only; remove for production copy('original_full.txt', $full_file); #--------------------------------------------- copy($full_file, $bu_file); my %ref_paths; _get_ref_paths($ref_file, \%ref_paths); { open my $ifh, '<', $bu_file; open my $ofh, '>', $full_file; while (<$ifh>) { chomp; my $cmd = substr $_, 5, -1; my @possibles = @{_assess_full_path($cmd, \%ref_paths)}; if (@possibles == 1) { $ofh->print(qq{CMD="$possibles[0]"\n}); } elsif (@possibles > 1) { $ofh->print(qq{QRY($.)="$_"\n}) for @possibles; } else { $ofh->print(qq{WTF($.)="$cmd"\n}); } } } #--------------------------------------------- # TODO - for demo only; remove for production print "\n*** ref file: '$ref_file'\n"; system cat => $ref_file; print "\n*** bu file: '$bu_file'\n"; system cat => $bu_file; print "\n*** full file: '$full_file'\n"; system cat => $full_file; #--------------------------------------------- sub _assess_full_path { my ($cmd, $ref_paths) = @_; my $possibles = []; my $pos = 1 + rindex $cmd, '/'; my $start = substr $cmd, 0, $pos; my $end = substr $cmd, $pos; my $max = substr $cmd, 0, rindex($cmd, '.') - 1; if (exists $ref_paths->{$start}) { for my $key (keys %{$ref_paths->{$start}}) { my $dir = "$start$key"; if (0 == index $max, $dir) { my $full_path = join '/', $dir, substr $cmd, length $dir; $full_path =~ y{/}{/}s; push @$possibles, $full_path; } } } return $possibles; } sub _get_ref_paths { my ($ref_file, $ref_paths) = @_; open my $fh, '<', $ref_file; while (<$fh>) { chomp; my $end = substr $_, rindex($_, '/') + 1; substr $_, rindex($_, '/') + 1, length($_), ''; $ref_paths->{$_}{$end} = 1; $ref_paths->{"$_$end/"}{''} = 1; } return; }
I dummied up some files to test this. Here's a sample run's output:
*** ref file: 'ref.txt' /a /a/b /a/b/c /b /b/c /c /ab /abc /abcd *** bu file: 'full.txt.BU' CMD="/a/a.sh" CMD="/aa.sh" CMD="/ab.sh" CMD="/abc.sh" CMD="/a/bc.sh" CMD="/a/b/c.sh" CMD="/a/b/c/.sh" CMD="/a/b/cd.sh" CMD="/a/b/c/d.sh" CMD="/x/y.z" CMD="/a/xyz.sh" CMD="/abcd.sh" CMD="/a/very 'special' command.exe" *** full file: 'full.txt' CMD="/a/a.sh" CMD="/a/a.sh" CMD="/a/b.sh" QRY(4)="/a/bc.sh" QRY(4)="/ab/c.sh" QRY(5)="/a/b/c.sh" QRY(5)="/a/bc.sh" CMD="/a/b/c.sh" WTF(7)="/a/b/c/.sh" QRY(8)="/a/b/cd.sh" QRY(8)="/a/b/c/d.sh" CMD="/a/b/c/d.sh" WTF(10)="/x/y.z" CMD="/a/xyz.sh" QRY(12)="/a/bcd.sh" QRY(12)="/abc/d.sh" QRY(12)="/ab/cd.sh" CMD="/a/very 'special' command.exe"
Notes:
— Ken
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: In place replacement from reference list
by LanX (Saint) on Sep 07, 2022 at 12:59 UTC | |
by AnomalousMonk (Archbishop) on Sep 07, 2022 at 20:01 UTC | |
by LanX (Saint) on Sep 07, 2022 at 21:00 UTC | |
by kcott (Archbishop) on Sep 07, 2022 at 17:29 UTC | |
by LanX (Saint) on Sep 09, 2022 at 01:27 UTC |