Hi haukex; I've created an account so that I can edit things going forward, but since I cannot edit the original post, please forgive posting the full code in a reply:
use strict;
use warnings;
use diagnostics;
use Text::CSV_XS;
use File::Find;
use File::Slurp qw(edit_file);
use List::Util qw(uniq);
my $spreadsheet = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
# Consume the first argument (the spreadsheet input) and push its
# contents to an array of arrays.
open my $fh, "<", $ARGV[0];
my @table;
while (my $row = $spreadsheet->getline($fh)) {
push @table, $row;
}
close $fh;
# Sort the array-of-arrays by the length of the
# first element of each subarray (our 'keys') such
# that the longest key comes first, and the
# shortest comes last. Then sort alphabetically.
@table = sort { length $b->[0] <=> length $a->[0] || $a->[0] cmp $b->[
+0] } @table;
# CSVs are received from another department with
# three columns, where the second denotes part of
# speech. This isn't needed for this operation, so
# discard it.
for my $row (@table) {
splice(@$row,1,1);
$row->[0] = quotemeta($row->[0]);
}
my @rows_to_cull = ();
# Pairwise comparison of rows to detect duplicate keys.
# Sort having been performed earlier, only pairwise comparisons
# of adjacent rows will result in matches.
for my $row_a (0..$#table) {
my $key = $table[$row_a]->[0];
my @values = $table[$row_a]->[1];
for my $row_b ($row_a+1..$#table) {
if ( $table[$row_a]->[0] eq $table[$row_b]->[0] ) {
push @values, $table[$row_b]->[1];
$table[$row_a]->[1] = \@values;
push @rows_to_cull, $row_b;
}
else {
last;
}
}
}
# Convert the array to a hash to ensure uniqueness,
# then back to an array for traversal, sorting the
# array in descending numeric order so removal of
# earlier rows doesn't interfere with indexing for
# later rows.
my %cull_hash = map { $_ => 1 } @rows_to_cull;
my @cull_array = keys %cull_hash;
@cull_array = sort { $b <=> $a } @cull_array;
for (@cull_array) {
splice(@table,$_,1);
}
# Loop which ensures the uniqueness of elements in the
# replacement candidate arrays.
for my $row (0..$#table) {
my $replacement = $table[$row]->[1];
if (ref($replacement) eq 'ARRAY' && length $replacement > 1) {
my @unique = uniq(@$replacement);
$table[$row]->[1] = \@unique;
}
}
# The following takes the second argument from the
# command line as the directory to operate on. It
# also identifies a filter subroutine to exclude
# non-documentation files from the checks, and a
# subroutine that will actually do the search-and-replace.
find({
preprocess => \&filter,
wanted => \&search_and_replace
},
$ARGV[1]
);
sub filter {
return grep { -d or (-f and ( /\.txt$/ or /\.rst$/ or /\.yaml$/))}
+ @_;
}
# Main loop
sub search_and_replace {
open my $target_file, "<", $_;
$| = 1;
my $filename = $_;
while (my $target_string = <$target_file>) {
for my $row (@table) {
my $search = $row->[0];
my $replacement = $row->[1];
if ((lc $target_string) =~ (lc $search)) {
print "Found $search in $filename in the following context
+:\n";
print "$target_string\n";
if (ref($replacement) eq 'ARRAY' && length $replacement >
+1) {
print "Choose a replacement candidate by typing the appr
+opriate number, or else type '0' to skip.\n";
my $count = 1;
for my $value (@$replacement) {
print "$count\) $value\n";
$count++;
}
my $choice = <STDIN>;
if ($choice >= 1 && $choice <= (length $replacement)) {
my $replace_choice = $replacement->[$choice];
edit_file { s/$search/$replace_choice/gi} $filename;
}
else {
print "Skipping this occurrence without replacing.\n";
}
}
else {
print "Confirm replacement with $replacement by typing y
+, or skip by typing any other key.\n";
my $choice = <STDIN>;
if ($choice eq 'y' || 'Y') {
edit_file { s/$search/$replacement/gi } $filename;
}
else {
print "Skipping this occurrence without replacing.\n";
}
}
}
}
}
close $target_file;
}
As for the input, here are some example rows from the CSV file itself, pre-sorted:
whitelist entries,Noun,access list entries
IP whitelist,Noun,IP access list
IP whitelist entries,Noun,IP access list entries
whitelist,Adjective,allow
whitelist,Noun,access list
whitelist,Noun,access-list
whitelist,Verb,allow
your whitelist,Noun,your access list
As far as output goes, the sort should (and currently does) result in an entry like 'IP whitelist' being checked for before 'whitelist' itself, and 'IP whitelist entries' before 'IP whitelist; so I would expect that after I confirm wanting to replace 'IP whitelist entries' with 'IP access list entries', I would NOT be subsequently prompted to replace 'IP whitelist' within that instance of 'IP whitelist entries', because 'whitelist' would have already been replaced there. Similarly, I wouldn't want to be prompted to replace 'whitelist' alone in that spot for the same reason.
Please let me know what more might be needed. I have started looking at the tutorial you linked and it seems promising.
EDIT: Corrected formatting errors due to copy-pasting code block with long lines.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.