Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm not a much of a coder, but a co-worker in my lab suggested that Perl would be a useful way to automate some data reformating that needs to be done and that it was fairly straight-forward even for a beginner. Basically my data is in a very human readable format right now in text files on my computer. I would like to turn this information into a table format so I can feed it into another program I am using. The data is formated roughly as follows:
#Case Number: 12345
People at table = 5
Seat 1: Joe
Seat 2: Steve
Seat 3: Mary
Seat 4: Jill
Seat 5: Bob
Jill speaks first
Round 1:
Jill says good
Bob doesn't talk
Joe says bad
Steve says good
Mary doesn't talk
Jill says that's enough
Steve says that's enough

Round 2: Next question
Jill says bad
Bob doesn't talk
Joe says bad
Steve says bad
Mary doesn't talk
Bob says that's enough
Then a new case begins in the file.
What I would like to do is go through one case and create a row in a table for each person involved in the case. I need to keep track of the actions taken after each question (a person can give a maximum of 5 responses per round), and what the second question was. I would also like the information about which seat they were sitting in. All of the information is there, I just need help figuring out how to pick through one case at a time (there can be around 100 cases per text file) and matching up the people with their actions in a table. Is this a reasonable thing to accomplish with Perl? Is there a better tool? Any pointers for getting started? Thanks a bunch for any insight.

Considered by friedo - code tags
Unconsidered by castaway - Keep/Edit/Delete: 8/30/1 - Where ?

Replies are listed 'Best First'.
Re: New to Perl
by Random_Walk (Prior) on Jan 20, 2005 at 12:21 UTC

    First off welcome to perl, I hope you enjoy your stay.

    As others have said your output requirements are rather vague so what I have done is parsed your input into a perl data structure. This I have just dumped out so you can see that we have all the data. You can really pick and choose what you want and how you want to format the output.

    I have made the assumption that the format you gave is fairly complete, e.g. what they say will always contain says or doesn't speak. I have allowed names to contain whitespace and funny characters. if there are certain to be simple then they can be better matched with (\w+). You mention each round having a question but I do not really see this in your data. I have captured the bit after the round number as a guess of where I find this (round 2 has a value here, round 1 not)

    click the read more links to open up the code underneath

    Cheers,
    R.

    Pereant, qui ante nos nostra dixerunt!
      %current_case && push @cases, \%current_case; %current_case=();

      Try it with a data set with more than one case.

        Hi Chromatic

        Thanks for reading and thinking about my code, its a fair cop, I just reset the hash rather than grabing a new one, DOH ! Sadly its 00:58 here, this evening I have enjoyed good company, a delicious Rioja and I got a 9am meeting, so no more coding till the morning.

        G'night,
        R.

        Pereant, qui ante nos nostra dixerunt!

        OK I found a couple of fixes, either make %current_case a package global with our then push a new ref to an empty hash into its typglob *current_case={}; or make current case a $scalar, store a ref to an anon hash in it $current_case={}; and replace all instances of %current_case{whatever} with $current_case->{whatever}. I guess the latter is the prefered solution

        Cheers,
        R.

        Pereant, qui ante nos nostra dixerunt!

      While I appreciate the time you've taken to help, I must say, I don't appreciate the way you've done it. You have given a nearly complete solution to someone asking for pointers on how to start. I'm sure you've heard the old proverb about teaching someone to fish rather than giving them a fish...

      Now, you may go on about how your code was intended to be a useful example to learn by, but I can't entirely agree. It's not commented, quite dense, and, frankly, doesn't exactly use what I would call good style. It works, and there's something to be said for that, but I'm dubious about using such examples for pedagogical purposes.

        You are probably right. I started playing with this and it took longer than I thought so I did not get time to comment the code, I thought as I had written it I may as well post it. It answered at least one of the OPs questions, can perl do this ;)

        The data structure is not exactly beginers stuff but as we did not have any spec for output I thought that was the easiest way to show that it can be done. Now we have an output spec I would (shall if I get time this evening) re-write it so it prints the data as it finds it removing the need for a Hash of Arrays of God knows what.

        Cheers,
        R.

        Pereant, qui ante nos nostra dixerunt!
        I respectfully disagree with you. It is not that he has given you a fish as much as he has built the pole for you. If it doesn't meet your standards, then that particular pole serves at least as a reference point for building your own.

        I don't mind the style that much, though I would never say my $scalar="bleh"; before I said my($scalar) = "bleh";. But that's just me. I'm probably one of the few people left who puts parens around a single my arg. Of course, that's an OCD thing more than anything else. Anyway, I've never thought that getting nit-picky about another person's perl style was polite. If TIMTOWTDI and the interpreter doesn't ultimately care what method is chosen, I don't think it's up to you or anyone else to chastise another programmer for his style. It just seems un-perl to me.

        I don't know if your intention was to be "politely rude" so to speak, but it comes off that way. What was wrong with simply saying "thanks for taking time out of your day to assist me" and disregarding the rest of what you felt you HAD to post. I could be interpreting that wrong, so I'm telling you how I heard it and giving you the opportunity to correct me.

        I think this is the first node I've read where working code was frowned upon.

        /renz.

        UPDATE: Ugh. tall_man: bad example on my part. You're right. Humorously enough I finally gave up my desire to put parens before everything, some time after posting this, and for the very reason you have given. I just neglected to modify this post (mainly because I forgot about it). As of late I have labored greatly to conform to perldoc perlstyle as much as possible, though I have never preferred a 4-column indent to a 2-column indent, and I find it difficult not to cuddle my elses.
Re: New to Perl
by PodMaster (Abbot) on Jan 20, 2005 at 11:40 UTC
    You should be able to get started and finish this by reading perlintro, it covers all the basics of syntaxt, how to open files ....

    Also useful is Where and how to start learning perl, but you can look at that later.

    If you get stuck, knowing How to RTFM (Tutorials) will save you lots of time, but if you can't figure it out, stop by around here.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

Re: New to Perl
by sasikumar (Monk) on Jan 20, 2005 at 11:17 UTC
    Hi

    Can you tell us what is the table format you are looking for. Please show us an example of the output table with respect to your sample data. If you have any code snippet post that too.

    Thanks
    SasiKumar
Re: New to Perl
by Anonymous Monk on Jan 20, 2005 at 17:08 UTC
    Wow, thanks alot for such quick responses....let me be a little more clear about how an output should look.

    Given the data snippet I used above I would like there to be 5 space deliniated rows of output (one for each player, I put commas here so it was clear which items where seperate). These rows will then be sent to a spreadsheet that is happy to import text files that are space deliniated. For example, this would be the perfect output for me for one player:

    Player_Name, Case#, Seat#, Speaking_Position, First_response_to_Question_1, Second_response_to_Question_1, Question_For_Round_2, First_response_to_Question_2, Second_response_to_Question_2,

    If there is no data for the Second_response field I would like there to be a 0 (for example in my real data people can answer up to 5 times, but usually they only give 2 or 3 responses and I'd like to fill the placeholders for extra responses with 0).
    So for Joe the output would look like this ideally (he is third to speak if we start with Jill):

    'Joe' , '12345' , '1' , '3' , 'bad' , '0' , 'Next Question' , 'bad' , '0'

    And I would get an output like this for each person at the table.

    Thanks again for the help, you folks are certainly helping me feel that this is a reasonable thing to accomplish as a beginner.

      Here is a more readable version with comment. I now include speaker order and made the output a little more readable. The problem pointed out by Chromatic is also fixed in this version

      #!/usr/bin/perl # the folllowing check your code for you # enforce some good behaviour (like declaring variables) # and catch a lot of finger pilotage errors use warnings; use strict; # here I declare some variables I am going to use # $ is a scalar (string, number, or a reference to another variable) # @ is an array of scalars # % is an associative array or hash, values looked up by a unique key my ( $case_number, $head_count, @player, $spoke_first, %player_said, $round );

      Cheers,
      R.

      Pereant, qui ante nos nostra dixerunt!

      This is not pretty but it is functional. I missed out the speaking order part and the seperation of reading from outputing makes the code far more complex than required. I will try to make a better example later, but I've written it so here goes.

Re: New to Perl
by sh1tn (Priest) on Jan 21, 2005 at 03:43 UTC
    It is to some extend self explanatory. Perhaps you can
    easily rearrange the output from the structure.


    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my @data = <DATA>; my $struct; my ($case, $table, $man, $seat, $round, $who, $what); for (@data){ /^\s*$/ and next; /^\s*#case\s+\w+:\s+(\d+)/i and $case = $1 and next; /^\s*people\s.+=\s*(\d+)/i and $table = $1 and next; /^\s*round\s+(\d+)/i and $round = $1 and next; /^\s*seat\s+(\d+):\s+(\w+)/i and ($seat, $man) = ($1, $2); /^(\w+)\s+(.+)?\s*$/ and ($who, $what) = ($1, $2); $struct->{$case}{$table}{'seat'}{$man} = $seat if $seat; push @{$struct->{$case}{$table}{'round'}{$round}{$who}}, $what if $ +round; } for my $case (keys %$struct){ for my $table (keys %{$struct->{$case}}){ for my $round (keys %{$struct->{$case}{$table}{'round'}}){ for my $man (sort keys %{$struct->{$case}{$table}{'round'}{$round}} +){ print "player: $man "; print "case: $case "; print "seat: $struct->{$case}{$table}{'seat'}{$man} "; print "round: $round "; print "QA: ", join '. ', @{$struct->{$case}{$table}{'round'}{$ro +und}{$man}}, "\n"; } } } } #print Dumper($struct); __DATA__ #Case Number: 12345 People at table = 5 Seat 1: Joe Seat 2: Steve Seat 3: Mary Seat 4: Jill Seat 5: Bob Jill speaks first Round 1: Jill says good Bob doesn't talk Joe says bad Steve says good Mary doesn't talk Jill says that's enough Mary talks for years Steve says that's not enough Round 2: Next question Jill says bad Bob doesn't talk Joe says bad Steve says bad Mary doesn't talk Bob says that's enough

    Which has the following output:
    player: Bob case: 12345 seat: 5 round: 1 QA: doesn't talk.
    player: Jill case: 12345 seat: 4 round: 1 QA: says good. says that's enough.
    player: Joe case: 12345 seat: 1 round: 1 QA: says bad.
    player: Mary case: 12345 seat: 3 round: 1 QA: doesn't talk. talks for years.
    player: Steve case: 12345 seat: 2 round: 1 QA: says good. says that's not enough.
    player: Bob case: 12345 seat: 5 round: 2 QA: doesn't talk. says that's enough.
    player: Jill case: 12345 seat: 4 round: 2 QA: says bad.
    player: Joe case: 12345 seat: 1 round: 2 QA: says bad.
    player: Mary case: 12345 seat: 3 round: 2 QA: doesn't talk.
    player: Steve case: 12345 seat: 2 round: 2 QA: says bad.