Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Parse Contact File

by mmittiga17 (Scribe)
on Sep 09, 2008 at 18:07 UTC ( [id://710162]=perlquestion: print w/replies, xml ) Need Help??

mmittiga17 has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I have a text file that I need to parse and format the output on a single line for each record. Each record is identified by the first field.
Input txt: 16XX27300 $ name John Doe 16XX27300 $ name2 Bla Bla 16XX27300 $ name3 275 Main ST 16XX27300 $ mlact 16H27300 16XX27300 $ addr 8TH Floor 16XX27300 $ city SAN Fran 16XX27300 $ state CA 16XX27300 $ zip 94111 16XX27301 $ name Jane Doe 16XX27301 $ name2 Bla Bla 16XX27301 $ name3 276 Main ST 16XX27301 $ name4 Tower 2 16XX27301 $ mlact 16XX27301 16XX27301 $ addr 8TH Floor 16XX27301 $ city SAN Fran 16XX27301 $ state CA 16XX27301 $ zip 94111 Desired Output: 16XX27300,$,John Doe,Bla Bla,275 Main ST,16H27300,8TH Floor,SAN Fran,C +A,94111 16XX27301,$,Jane Doe,Bla Bla,276 Main ST,Tower 2,16XX27301,8TH Floor,S +AN Fran,CA94111 I am assuming the best way to tackle this is to use a hash, however I +stink at them. Any code examples to get me started would be apprecia +ted.

Replies are listed 'Best First'.
Re: Parse Contact File
by kyle (Abbot) on Sep 09, 2008 at 18:32 UTC

    I'd start with a split and use the fields it finds to put things in the hash. It might be better to use a regular expression, if that helps you validate input.

    use strict; use warnings; use Data::Dumper; my %results; while (my $line = <DATA>) { chomp $line; my @fields = split /\s+/, $line, 4; $results{ $fields[0] }{ $fields[2] } = $fields[3]; } print Dumper \%results; __DATA__ 16XX27300 $ name John Doe 16XX27300 $ name2 Bla Bla 16XX27300 $ name3 275 Main ST 16XX27300 $ mlact 16H27300 16XX27300 $ addr 8TH Floor 16XX27300 $ city SAN Fran 16XX27300 $ state CA 16XX27300 $ zip 94111 16XX27301 $ name Jane Doe 16XX27301 $ name2 Bla Bla 16XX27301 $ name3 276 Main ST 16XX27301 $ name4 Tower 2 16XX27301 $ mlact 16XX27301 16XX27301 $ addr 8TH Floor 16XX27301 $ city SAN Fran 16XX27301 $ state CA 16XX27301 $ zip 94111

    That produces a data structure like this:

    You could get your CSV-ish output then from Text::CSV_XS.

    One thing to beware of is that this reads the whole file into one large structure. If that's going to be a problem, you'd want to detect when you'd moved from one record to another and do output on a record-by-record basis instead of accumulating them all. Also, the hash does not maintain the order of the incoming records. If that's going to be an issue, you'd have to keep that order in a separate array or have a way to sort or something.

    For more about hashes, see perldata. The above also uses hash references, which you could learn about in perlreftut or perlref.

Re: Parse Contact File
by GrandFather (Saint) on Sep 10, 2008 at 00:40 UTC

    That simply will not work (probably)! Notice that the second record has a name4 field but the first does not. Sure, you can generate the output you desire, but there may be no way to interpret the result - how do you know where missing fields are? Can "extra" fields be ignored or should (for example) multiple name fields be concatenated together in some fashion?

    You should let us in on the bigger picture. Why do you want to do this? How is the output data going to be used? Do you really want CSV output, or is something human readable more appropriate?


    Perl reduces RSI - it saves typing

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://710162]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2024-03-29 11:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found