onemojofilter has asked for the wisdom of the Perl Monks concerning the following question:

Good morning:

I am trying to take a spool file and replacing a string in a specific line, position within that file based on values found in other places in the file. As an example:

File sample contents:

H BIN LOC W123 CUSTOMER 12345 ABC BOTTLE COMPANY ITEM DESCRIPTION 24/12 OZ FC ABC 12-12 x 9-12 x 6-15 123 MY COMPANY PRODUCT # 12341234 20221103 QUANTITY PER PALLET ORDER QUANTITY 500 10,000 CUST. PO. NUMBER MY COMPANY ORDER NUMBER INVENTORY 961123-01 DATE PRINTED SHIP DATE 10/31/2022 11/03/2022 FREIGHT AREA SAL Load # 1 OF 21 ^LH BIN LOC W123 CUSTOMER 12345 ABC BOTTLE COMPANY ITEM DESCRIPTION 24/12 OZ FC ABC 12-12 x 9-12 x 6-15 123 MY COMPANY PRODUCT # 12341234 20221103 QUANTITY PER PALLET ORDER QUANTITY 500 10,000 CUST. PO. NUMBER MY COMPANY ORDER NUMBER INVENTORY 961123-01 DATE PRINTED SHIP DATE 10/31/2022 11/03/2022 FREIGHT AREA SAL Load # 1 OF 21

Firstly, the first line will display something like an "H" and each subsequent "H" in the file will be preceded by "^L" (which signifies a new "page") Secondly, I will take the value found on line 10 (relative to where "H" appears") on position 25 through 35 (in this case 20221103) and replace it with another value based upon the value found on line 14, positions 25 through 35. (Basically a database lookup, which I can do).


Google searches seem to only mention either using 'sed' (which I can readily do in a shell script, but am trying to do this in a more efficient and readable way using Perl) or looping through the file (which I'm already doing in a shell script). Is the only way to change a specific line/position in a file really going to be looping through the entire file?

I was hoping there was a method by which I could specify a line and a position and extract (and replace as needed - writing that portion back to the file - sort of how 'sed -i' would do, but something more perlish?) that instead of slurping the entire file.

Any thoughts/suggestions appreciated!

Replies are listed 'Best First'.
Re: Changing string in specific line/position in a file
by kcott (Archbishop) on Nov 01, 2022 at 01:13 UTC

    G'day onemojofilter,

    Your data contains four types of whitespace (assuming your alignment uses tabs). These are difficult to differentiate on a webpage. I've used `cat -vet` which shows: a tab as ^I; a form feed as ^L; a newline as $; and, a space as itself.

    I created the following test data (working.txt) which has examples of the different types of whitespace. I added an additional page so you now have a "^LH" following both an "H" and a "^LH". I removed much of the original data and replaced the rest with text intended to explain each line. I believe this is still representative of what you originally showed.

    $ cat -vet working.txt H$ after 1 space$ after 2 spaces$ no leading spaces$ text^I^Ifollowed by 2 tabs$ target for^Iswap1$ ...$ source for^Ireplace1$ ...$ ^LH$ after 1 space$ after 2 spaces$ no leading spaces$ text^I^Ifollowed by 2 tabs$ target for^Iswap2$ ...$ source for^Ireplace2$ ...$ ^LH$ after 1 space$ after 2 spaces$ no leading spaces$ text^I^Ifollowed by 2 tabs$ target for^Iswap3$ ...$ source for^Ireplace3$ ...$

    I then ran this code:

    #!/usr/bin/env perl use strict; use warnings; use autodie; use constant { CHANGE_LINE => 4, SOURCE_LINE => 6, PAGE_SEP => "\n\fH\n", }; use File::Copy 'copy'; my $work_file = 'working.txt'; my $bu_file = 'working.txt.bu'; copy($work_file, $bu_file) or die "Can't 'copy($work_file, $bu_file)': $!"; { open my $in_fh, '<', $bu_file; open my $out_fh, '>', $work_file; local $/ = PAGE_SEP; while (<$in_fh>) { chomp; my @lines = split /\n/; my ($change_line, $source_line) = (CHANGE_LINE, SOURCE_LINE); ++$change_line, ++$source_line if $. == 1; my ($replace) = $lines[$source_line] =~ /(\S+)$/; $lines[$change_line] =~ s/\S+$/$replace/; print $out_fh join("\n", @lines), (eof($in_fh) ? "\n" : PAGE_SEP); } }

    Here's the result:

    $ cat -vet working.txt H$ after 1 space$ after 2 spaces$ no leading spaces$ text^I^Ifollowed by 2 tabs$ target for^Ireplace1$ ...$ source for^Ireplace1$ ...$ ^LH$ after 1 space$ after 2 spaces$ no leading spaces$ text^I^Ifollowed by 2 tabs$ target for^Ireplace2$ ...$ source for^Ireplace2$ ...$ ^LH$ after 1 space$ after 2 spaces$ no leading spaces$ text^I^Ifollowed by 2 tabs$ target for^Ireplace3$ ...$ source for^Ireplace3$ ...$

    Note that all of the original whitespace is retained unaltered. I do recommend you make a backup; doing this within the code is easiest and won't be forgotten.

    Your initial text positions were incorrect. "25 through 35" covers 11 characters, but "20221103" is only 8 characters. Also, I made the starting position 23 not 25, and that assumes that all of the preceding whitespace was actually spaces; if some, or all, were tabs, that would be a different number. Tabs are just a single character:

    $ perl -E 'my $x = "|\t|"; say $x; say length $x;' | | 3

    If the situation is more complex than suggested in your OP, let Perl do the counting for you. Bear in mind that your character positions may start at 1 (1st char. is at pos. 1) but Perl will count from zero. I've no idea what you might need, but this should give you some hints:

    $ perl -E ' my $x = "PROMPT\t\tTO CHANGE"; my $len = length $x; my $from_index = rindex($x, "\t") + 1; my $from_pos = $from_index + 1; say $x; say "$from_pos through $len"; say substr $x, $from_index, $len - $from_index; say substr $x, $from_index; ' PROMPT TO CHANGE 9 through 17 TO CHANGE TO CHANGE

    — Ken

Re: Changing string in specific line/position in a file
by Corion (Patriarch) on Oct 31, 2022 at 17:27 UTC

    Maybe Tie::File is what you want? That allows you to treat a file as if it were an array of lines.

    In the background, it still needs to rewrite the whole file, because if you change the length of a line while replacing stuff, the rest of the file needs to be adjusted.

      Thanks! I'll check it out.
Re: Changing string in specific line/position in a file
by GrandFather (Saint) on Oct 31, 2022 at 20:19 UTC

    The bottom line, as Corion suggests is that you can't, usually, simply replace a line in a file. Most file systems present files as a sequence of bytes and there is no facility to insert/remove bytes from the sequence. Editing a line usually implies a possible change in length. Most file systems don't support that so the only option is to rewrite the entire file.

    There are lots of ways to make such edit operations more or less efficient on disk/memory space and processor/io system time. But there are always compromises that need to be made depending on your situation. For example, in your case, if the file is small (say a few hundred MB or so), just read the whole thing into memory, do all the editing in memory, then write it back out.

    Although, actually, your data looks like it ought to be in a database. If that were the case then your task is trivial: update a single cell in a table (not so trivial for the database engine, but that's not your problem). Almost always its best to not overthink the plumbing - don't worry about performance unless performance becomes an issue. Rewriting a file once to make an edit should not be a problem in terms of performance, but getting the code right can be more subtle than you might think. Tools like sed don't do any particular magic, they read and rewrite the whole file to get the job done.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Changing string in specific line/position in a file
by eyepopslikeamosquito (Archbishop) on Oct 31, 2022 at 22:27 UTC

    Google searches seem to only mention either using 'sed' (which I can readily do in a shell script, but am trying to do this in a more efficient and readable way using Perl)

    Agreed. Write the whole thing in Perl. That will be much more maintainable and enjoyable than hacking out a motley mix of shell/sed/awk/grep. The key reason is that Perl can comfortably scale to much larger scripts. In my experience, "small" shell scripts have a way of growing ... and growing ... and growing ... until they become maintenance nightmares! But, by then, how do you justify a rewrite? The cost of rewriting, the opportunity cost of not working on something else, and the risk of breaking previously working code in the rewrite. So write the whole thing in Perl to start with.

    For more detail on this topic, see Unix shell versus Perl.

Re: Changing string in specific line/position in a file
by tybalt89 (Monsignor) on Oct 31, 2022 at 21:16 UTC

    The test data you have provided does not match your written description. For example, the string 20221103 does not go from byte 25 to byte 35, in fact, that line does not have 35 characters in it.

Re: Changing string in specific line/position in a file
by hv (Prior) on Oct 31, 2022 at 18:06 UTC

    You should be able to achieve the writing part of this by opening the file for read and write, then using a combination of tell and seek to a) know where you have read to, b) set where to write to, and c) (if needed) restore the position for further reading.

    It might look something like the following (untested); I've left placeholder functions for the bits you need to supply.

    # you'll need to provide $filename open(my $fh, '+<', $filename) or die "Error opening $filename: $!"; # create a scope to localize the input record separator { # use formfeed as input record separator local $/ = "\x{0c}"; # track position of start/end of current record my($start_pos, $end_pos) = (undef, 0); while (defined(my $record = <$fh>)) { ($start_pos, $end_pos) = ($end_pos, tell($fh)); # placeholder: decide if this record needs to be changed next unless needs_change($record); # placeholder: work out what change should be made my($offset_to_write_at, $text_to_write) = required_change($record) +; seek($fh, $start_pos + $offset_to_write_at, 0) or die "Error seeking to $start_pos + $offset_to_write_at"; print $fh, $text_to_write or die "Error writing update"; # ready to read next record seek($fh, $end_pos, 0) or die "Error seeking to $end_pos"; } } close $fh or die "Error closing filehandle, writes may not have comple +ted";

    Note that seek and tell deal with byte offsets, so if your data is not ASCII you need to take extra care when determining the correct offset to write at.

      Does this not make somewhat of a mess if the replaced line length is different than the original line?

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

        It would do, yes; but my reading of the original post was that the author understands that, and intends to replace with something of the same length.

Re: Changing string in specific line/position in a file
by Anonymous Monk on Oct 31, 2022 at 22:07 UTC

    It is worth noting that a file is essentially an array of bytes. There is no way to 'jump' to a file - without streaming through the file finding every new line characters.

    My suggestion would be to set perl's record separator to the end of page sequence, then 'slurp' each page. For each page: extract the value you want with a match regexp, and update the value with a subsitution regex. Then write the page to a new file.

    Example:
    open my $in , "<", 'input.txt'; open my $out, ">", 'output.txt'; $/ = "^L"; # Set the end of page sequence while (my $page = <$in>) # Get page { if ($page =~ m/\nINVENTORY +(\d+-\d+)/) # Get the inventory number { my $inventorNumber = $1; # do your database lookup to get new date from inventory number my $date = doDatabaseLookup($inventoryNumber); # Replace the existing date with the new date $page =~ s/(\nMY COMPANY PRODUCT #\n\d+ +)(\d{8})/$1$date/; } print out $page; }
Re: Changing string in specific line/position in a file
by tybalt89 (Monsignor) on Jun 10, 2024 at 15:20 UTC

    Here's one way to go to the specified location in each chunk. (NOTE: fetching and replacement are done to end-of-line, since one field is short.)

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11147864 use warnings; @ARGV = 'd.11147864'; # FIXME for testing only, comment out for real l +ife local $/ = "\f"; while( <> ) { if( /^ (?:.*\n){13} .{24} (.+)/x ) # line 14 col 25 grab rest of tex +t { my $replacement = $1; # maybe more calculations here s/^ (?:.*\n){9} .{24} \K .+/$replacement/x; # line 10 col 25 repla +ce rest } print; }