in reply to Re: Updating fields in a text file
in thread Updating fields in a text file

The file will be a fixed length file, so let's say my file looks like this...

name address city state zip jane doe 123 main st pittsburgh pa 99999 john doe 456 second st pittsburgh pa 99999

I would like to update the zip to 15206 if the city=pittsburgh. I'd like to do it for all lines in the file. Does that make sense?

I'm very new to perl, so please forgive my ignorance.

Edited by planetscape - added code tags and rudimentary formatting

( keep:0 edit:5 reap:0 )

Replies are listed 'Best First'.
Re^3: Updating fields in a text file
by Limbic~Region (Chancellor) on Jul 11, 2006 at 15:45 UTC
    wendy24,
    You are still omitting details that would allow us to provide a full working solution such as the width of each field. Additionally, do the records have a newline separator or is it just one long runone line?
    #!/usr/bin/perl use strict; use warnings; my ($in, $out) = @ARGV; die "Usage: $0 <input file> <output file>" if ! defined $in || ! defin +ed $out; open(my $in_fh, '<', $in) or die "Unable to open '$in' for reading: $! +"; open(my $out_fh, '>', $out) or die "Unable to open '$out' for writing: + $!"; while ( <$in_fh> ) { chomp; my ($name, $add, $city, $state, $zip) = unpack('A10A20A15A12A5', $ +_); $zip = 15206 if uc($city) eq 'PITTSBURGH'; print $out_fh $name, $add, $city, $state, $zip, "\n"; }
    Of course, this assumes that name is only 10 characters long and zip is 5 but they can be adjusted accordingly. It also assumes the records are newline separated and will not work otherwise. Don't worry about being new but think about what information is needed to solve the problem even if you don't know how to solve it yourself.

    Cheers - L~R

      The records will not be runon lines. The fields can and probably will be of different lengths, but I can adjust for that. Thanks for your help. I will try your suggestions.
Re^3: Updating fields in a text file
by davido (Cardinal) on Jul 12, 2006 at 06:41 UTC

    Here is an untested one-liner:

    perl -pi.bak -e "/.{23}pittsburgh/ && s/\d{5}$/15206/;" filename.txt

    This works as follows:
    Check to see if the line contains the word 'pittsburgh', starting on the 24th character position in the line. You did mention that the lines are fixed length. You may have to tailor the {23} to meet your actual data field widths. The insistence on commencing the search for 'pittsburgh' at the 24th position in the line is to eliminate false positives such as the odd possibility of someone living on pittsburgh street, or being named John Pittsburgh.

    If it does find 'pittsburgh' in the correct position, perform a substitution on the final five numeric digits found on the line, substituting in the new zip code. Trailing newline is ignored and preserved.

    The -p switch wraps the code in a while loop and outputs the result of any executed code. The -i switch turns on 'in place editing.' See perlrun for a more thorough explanation of the command line switches, and perlre and perlretut for the rest. ;)

    Update:
    Wait, I'm confused. In one followup node you said that the file is fixed length. I took this to mean that the fields are fixed length. In another followup, where you posted as Anonymous Monk, you (at least I think it's you) said that the fields may be variable length, but that you can adjust for that. If the latter is true, my solution breaks, and you've got one heck of a problem. Here's why:

    If you have variable width fields, delimited with whitespace, and the fields may also each contain whitespace (such as between house numbers and street names), your delimiters are not unique, and thus, not special. How can you check the city as the third field if you don't have any sure method of delimiting fields? Your sample data implied fixed-length fields. It also implied that you're not 'escaping' whitespace that might be embedded within a field. It also implied that you're not wrapping you fields in anything like quotes. So there is no way to predict whether a piece of whitespace represents a field delimiter, or simply a space character within the text. For that reason, either your data is fundamentally flawed, or you're not showing us the whole big picture. Which is it? Your data needs one of the following characteristics:

    • Fixed width fields.
    • Variable width fields with unique delimiters.
    • Variable width fields with non-unique delimiters, but with some means of escaping embedded characters that might otherwise seem to be delimiters.
    • Variable width fields with quoted data to help distinguish between delimiters and plain text. ...note, this opens another can of worms: escaping quotes. ;)
    • Some other clear-cut easily definable means of identifying where each field begins.

    Dave