in reply to Updating fields in a text file

wendy24,
Assuming by field, you mean that the data is delimited in some way - see Text::CSV_XS or Text::xSV. Additionally, DBD::CSV may make the task even easier depending on your situation. If your fields are fixed width then perhaps you want unpack.

In a nutshell, you have not provided enough information for us to help you. In general, you want to read the records in examining the field that dictates the new value of the other field, perform your calculation, and then write the record out to a new file in sequence. The above paragraph provides suggestions on how you might accomplish that.

Cheers - L~R

Replies are listed 'Best First'.
Re^2: Updating fields in a text file
by EvanK (Chaplain) on Jul 11, 2006 at 15:32 UTC
    As Limbic said, we need more information to give better answers. But, if you are working with a delimited file, i would definately second the recommendation for DBD::CSV. It's saved my bacon more than a few times.

    Then, of course, theres the by-hand option of reading in the contents of the file, using regexes and conditionals, and writing it all back out. But thats really just if you have a very unique situation (or if you like rolling your own a bit too much).

    __________
    Build a man a fire, and he'll be warm for a day. Set a man on fire, and he'll be warm for the rest of his life.
    - Terry Pratchett

Re^2: Updating fields in a text file
by wendy24 (Initiate) on Jul 11, 2006 at 15:36 UTC

    The file will be a fixed length file, so let's say my file looks like this...

    name address city state zip jane doe 123 main st pittsburgh pa 99999 john doe 456 second st pittsburgh pa 99999

    I would like to update the zip to 15206 if the city=pittsburgh. I'd like to do it for all lines in the file. Does that make sense?

    I'm very new to perl, so please forgive my ignorance.

    Edited by planetscape - added code tags and rudimentary formatting

      wendy24,
      You are still omitting details that would allow us to provide a full working solution such as the width of each field. Additionally, do the records have a newline separator or is it just one long runone line?
      #!/usr/bin/perl use strict; use warnings; my ($in, $out) = @ARGV; die "Usage: $0 <input file> <output file>" if ! defined $in || ! defin +ed $out; open(my $in_fh, '<', $in) or die "Unable to open '$in' for reading: $! +"; open(my $out_fh, '>', $out) or die "Unable to open '$out' for writing: + $!"; while ( <$in_fh> ) { chomp; my ($name, $add, $city, $state, $zip) = unpack('A10A20A15A12A5', $ +_); $zip = 15206 if uc($city) eq 'PITTSBURGH'; print $out_fh $name, $add, $city, $state, $zip, "\n"; }
      Of course, this assumes that name is only 10 characters long and zip is 5 but they can be adjusted accordingly. It also assumes the records are newline separated and will not work otherwise. Don't worry about being new but think about what information is needed to solve the problem even if you don't know how to solve it yourself.

      Cheers - L~R

        The records will not be runon lines. The fields can and probably will be of different lengths, but I can adjust for that. Thanks for your help. I will try your suggestions.

      Here is an untested one-liner:

      perl -pi.bak -e "/.{23}pittsburgh/ && s/\d{5}$/15206/;" filename.txt

      This works as follows:
      Check to see if the line contains the word 'pittsburgh', starting on the 24th character position in the line. You did mention that the lines are fixed length. You may have to tailor the {23} to meet your actual data field widths. The insistence on commencing the search for 'pittsburgh' at the 24th position in the line is to eliminate false positives such as the odd possibility of someone living on pittsburgh street, or being named John Pittsburgh.

      If it does find 'pittsburgh' in the correct position, perform a substitution on the final five numeric digits found on the line, substituting in the new zip code. Trailing newline is ignored and preserved.

      The -p switch wraps the code in a while loop and outputs the result of any executed code. The -i switch turns on 'in place editing.' See perlrun for a more thorough explanation of the command line switches, and perlre and perlretut for the rest. ;)

      Update:
      Wait, I'm confused. In one followup node you said that the file is fixed length. I took this to mean that the fields are fixed length. In another followup, where you posted as Anonymous Monk, you (at least I think it's you) said that the fields may be variable length, but that you can adjust for that. If the latter is true, my solution breaks, and you've got one heck of a problem. Here's why:

      If you have variable width fields, delimited with whitespace, and the fields may also each contain whitespace (such as between house numbers and street names), your delimiters are not unique, and thus, not special. How can you check the city as the third field if you don't have any sure method of delimiting fields? Your sample data implied fixed-length fields. It also implied that you're not 'escaping' whitespace that might be embedded within a field. It also implied that you're not wrapping you fields in anything like quotes. So there is no way to predict whether a piece of whitespace represents a field delimiter, or simply a space character within the text. For that reason, either your data is fundamentally flawed, or you're not showing us the whole big picture. Which is it? Your data needs one of the following characteristics:

      • Fixed width fields.
      • Variable width fields with unique delimiters.
      • Variable width fields with non-unique delimiters, but with some means of escaping embedded characters that might otherwise seem to be delimiters.
      • Variable width fields with quoted data to help distinguish between delimiters and plain text. ...note, this opens another can of worms: escaping quotes. ;)
      • Some other clear-cut easily definable means of identifying where each field begins.

      Dave