Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to do a basic text conversion. I would greatly appreciate a few tidbits of wisdom. I am trying to get a file that looks like this: name(1) address(1) city(1), state(1) zip(1) phone(1) spouse(1) name(2) address(2) city(2), state(2) zip(2) phone(2) spouse(2) ... to look something like this: name(1)|address(1)|city(1)|state(1)|zip(1) name(2)|address(2)|city(2)|state(2)|zip(2) ... I know this is a relatively simple program, but I am new. Can anybody help?

Replies are listed 'Best First'.
Re: Simple Text Conversion
by btrott (Parson) on Apr 06, 2000 at 22:36 UTC
    Here's another:
    #!/usr/local/bin/perl -wi.bak use strict; my @recs; while (<>) { chomp; push @{$recs[int(($.-1) / 5)]}, split /,?\s/; print map join('|', @$_) . "\n", $recs[int($.-1) / 5] if !($. % 5); }
    Run it like
    % reformat.pl <file>
    Saves the original in <file>.bak.
Re: Simple Text Conversion
by chromatic (Archbishop) on Apr 06, 2000 at 19:51 UTC
    If what you have is a space-separated file (hmm, there's a comma in there too) and you want to change it to a pipe-separated file (and turn the commas into spaces), you want the tr/// operator:
    my $data =~ tr/ /|/; $data =~ tr/,//d; # get rid of the comma, if you want
    This will transl(iter)?ate the space character into the pipe character, and the comma character into the space character.

    If there's a possibility that any of the fields will contain spaces or pipes or commas, look at the Text::CSV module from CPAN. Writing a regex to take care of this will not only test your skills at planning for all contingencies, it will cause you to go bald.

Re: Simple Text Conversion
by btrott (Parson) on Apr 06, 2000 at 20:27 UTC
    I think the op's actually got a carriage-return separated file, if you view source. So I'm not absolutely sure on the format, but it looks like there are 5 lines per record, basically. The other thing is, do you now want the spouse and phone number in the pipe-separated records?

    Anyway:

    use strict; open FH, "foo" or die "Can't open: $!"; my @recs; while (<FH>) { chomp; push @{$recs[int(($.-1) / 5)]}, $_; } close FH or die "Can't close: $!"; open FH, ">foo.new" or die "Can't open: $!"; for my $ref (@recs) { # break up the city, state, zip into 3 parts my($city, $state, $zip); if ($ref->[2] =~ /(.*?),\s(.*?)\s(.*)/) { ($city, $state, $zip) = ($1, $2, $3); } # join it all together into a pipe-separated # record, then write it out my $new = join "|", @{$ref}[0,1], $city, $state, $zip, @{$ref}[3,4]; print FH $new, "\n"; } close FH;
    I haven't tested this very thorougly, but it looks like it'll work. It's rather ugly, too. :)
      Deuglification, then:
      use strict; open FH, "foo" or die "Can't open: $!"; my @recs; foreach my $line (<FH>) { chomp $line; push @recs, [ split /,?\s/, $line ]; } close FH or die "Can't close: $!"; open FH, ">foo.new" or die "Can't open: $!"; foreach my $line_ref (@recs) { my $line = join '|', @$line_ref[0 .. 4]; print FH $line, "\n"; } close FH;
      Note that this is also untested. I maintain that a proper use of split is better than an apple a day.

      Interesting bits for the Original Poster:

      • We push an array reference onto @recs
      • split can take an arbitrarily complex regex, instead of just a single character. Use it liberally!
      • We use an array slice to get at only the first few fields we want.
        Yes, but then, mine worked. :) Just kidding.

        You're just pushing each line arbitrarily onto @recs--you need to group them in sets of 5, because that's what the original file looked like. Plus why are you using foreach in the original reading-in-the-file loop? That

        foreach my $line (<FH>)
        reads the entire file into a temp array--not a horrible thing in many cases, but still, if we can process on a line by line basis, we may as well do so. :)

        I like your use of split quite a lot, though.

        So, combining your ideas and mine:

        use strict; open FH, "foo" or die "Can't open: $!"; my @recs; while (<FH>) { chomp; push @{$recs[int(($.-1) / 5)]}, split /,?\s/; } close FH or die "Can't close: $!"; open FH, ">foo.new" or die "Can't open: $!"; print FH map join('|', @$_) . "\n", @recs; close FH;
        Notice I took the array slice out--I think the op wanted everything in the array. If not, though, he/she should stick it back in, just
        @$ref[0..4]
        instead of
        @$ref
        And I'm now using map, just cause map is great.