Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Thanks for providing additional context.

if my script finds that a file contains the search string 'IP whitelist' on a given line, even if I confirm that I want to replace this with 'IP access list', it will then prompt me to replace just 'whitelist' with one of its various candidates. What I would expect to happen in that case is that the replacement of 'IP whitelist' with 'IP access list' happens before the search for the 'whitelist' key is initiated, preventing it from finding a match there.

What is actually going on (and what I didn't realize on my first reading) is that you're reading a line from the input file into the $target_string variable, but once that line is in that variable, it never changes, regardless of whether you edit the file it came from, since the line was copied from the file into memory. That's why the regex engine continues to find matches in $target_string even after you've made the edit in the file.

The common pattern to use in Perl instead is to write an output file with the modified lines while reading the input file. Since in your code, you seem to want to do in-place edits of the file, see my node on that here. The slurp-then-spew approach is fairly easy if your files will always fit comfortably into memory; for line-by-line editing I might recommend my own module File::Replace (for examples of line-by-line editing, see the module's documentation instead of the aforementioned node).

In other words, instead of your edit_file calls, you'd do something like $target_string =~ s/$search/$replace_choice/gi, and then build your output file by writing out each $target_string line. However, there is still more trickyness here: If you have a line such as "Edit the whitelist by pressing the edit whitelist button." or something like that, then because of the /g modifier on that regex, you'd only be presented with the choice for what term to replace "whilelist" with once, and that replacement would be carried out for the entire line. I kind of doubt that's a desired behavior for you.

BTW, are you sure your search terms will always be on one line, or could strings like "whitelist entries" be split on two lines? That would also require an advanced approach.

These are examples for reasons why I asked for representative sample input along with the expected output for that input, and in this case representative also means that it should include all of such "interesting" cases. While I'd like to provide sample code or at least more specific advice on how to fix your code, without knowing the answers to questions like these, I run the risk of writing something that ends up not working on your actual data. Extensive test cases are very useful!

What I am currently thinking is that it should be possible to do something like s{ ($search) }{ get_replacement($1) }xeg where sub get_replacement does the prompting of the user and returns the replacement term (or the original string if the user doesn't want to replace). $search can even be a regex of all of the search terms, built dynamically.

A few more things I noticed in your code:

  • Check your opens for errors - "open" Best Practices
  • Even though you're using File::Find, for better modularity I would use regular argument passing to sub search_and_replace, and write e.g. wanted => sub { search_and_replace(\@table, $_) }
  • It seems to me like your @table could be replaced by a hash instead of all the duplicate key detection code you currently have. Keys in Perl hashes are unique, and such tables are easy to build with Perl's autovivification - if I treat a nonexistent hash entry as if it were an array reference, it will automagically become an array reference. Note that since you want to do case-insenstive matching I'm using fc, which is better than lc in case of Unicode. Building a hash of arrays using your sample data:
    use warnings; use strict; use feature qw/fc/; use Text::CSV_XS; use List::Util qw/uniq/; use Data::Dump; my $csvfile = $ARGV[0]; my %table; open my $fh, '<', $csvfile or die "$csvfile: $!"; my $csv = Text::CSV_XS->new({binary=>1, auto_diag=>2}); while ( my $row = $csv->getline($fh) ) { push @{$table{ fc($row->[0]) }}, $row->[2]; } $csv->eof or $csv->error_diag; close $fh; $_ = [uniq @$_] for values %table; dd \%table; __END__ { "ip whitelist" => ["IP access list"], "ip whitelist entries" => ["IP access list entries"], "whitelist" => ["allow", "access list", "access-list"], "whitelist entries" => ["access list entries"], "your whitelist" => ["your access list"], }
  • As before, instead of looping over @tables (or %tables), building a single regex should be more efficient.
  • You may wish to consider modules for prompting instead of rolling your own.
  • (It's Friday evening here so I didn't scan the rest of your code in detail, so other Monks may have further tips.)

In reply to Re^3: Immediately writing the results of search-and-replace by haukex
in thread Immediately writing the results of search-and-replace by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-16 15:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found