Manipulating Data by a Field Identifier

rycher has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have the following MySQL outputted CSV file:

"create_stamp","username","first_name","last_name","name","name","exte
+nsion","fax_flag","exchange","position","username"
"2009-03-02 15:31:52","SimpsonH","Homer","Simpson","Nuclear Control Ce
+nter","Radioctive Hall","6798","1","218 555","Nuclear Control Operato
+r","BurnsM"
"2009-03-02 15:31:52","SimpsonH","Homer","Simpson","Nuclear Control Ce
+nter","Radioctive Hall","6793","0","218 555","Nuclear Control Operato
+r","BurnsM"
[download]

I already have the data writing to a file in the following format, with the exception of the fax_flag/faxNumber, utilizing the Text::CSV module(print columnx, etc). What I'm getting stuck on is the part where the file gets read in, then equates the fax_flag with '1' and only writes the 'faxNumber' field to the line. The end result should look like this--

dn: uid=simpsonh
givenName: Homer
sn: Simpson
department: Nuclear Control Center
buildingName: Radioactive Hall
telephoneNumber: 218 555-6793
faxNumber: 218 555-6798
title: Nuclear Control Operator
manager: uid=burnsm
[download]

Comment on Manipulating Data by a Field Identifier Select or Download Code

Replies are listed 'Best First'.
Re: Manipulating Data by a Field Identifier by McDarren (Abbot) on May 04, 2009 at 01:51 UTC
Suggestion: Post the code that is not working for you, so people can see it, comment, and point you in the right direction. Cheers, Darren :)	[reply]
Re^2: Manipulating Data by a Field Identifier by rycher (Acolyte) on May 04, 2009 at 01:57 UTC
Thanks Darren. The code works... it's just that I do not how to get the fax flag line identified and inserted as a faxNumber field. Anyway, here it is: #!/usr/bin/perl use strict; use warnings; use Text::CSV; my $file = 'Employee.csv'; my $csv = Text::CSV->new(); open (CSV, "<", $file) or die $!; while (<CSV>) { next if ($. == 1); if ($csv->parse($_)) { my @columns = $csv->fields(); my @choppedcols = $csv->fields(); my $username = lc($columns[1]); my $first_name = ($columns[2]); my $last_name = ($columns[3]); my $department = ($columns[4]); my $LocationBuilding = ($columns[5]); my $phoneid = ($columns[6]); my $faxflag = ($columns[7]); my $exchange = ($columns[8]); my $telephone = ($columns[8]).-($columns[6]); my $title = ($columns[9]); my $manager = lc($columns[10]); print "dn: uid=$username\ngivenName: $first_name\nsn: $last_na +me\ndepartment: $department\nbuildingName: $LocationBuilding\ntelepho +nenumber: $ telephone\ntitle: $title\nmanager: uid=$manager\n\n"; } else { my $err = $csv->error_input; print "Failed to parse line: $err"; } } close CSV; [download]	[reply] [d/l]
Re^3: Manipulating Data by a Field Identifier by McDarren (Abbot) on May 04, 2009 at 02:20 UTC
Okay, I see your problem. You have potentially two records for each user - one containing a telephone number, and the other a fax number, yes? This means that you really can't print each line as you iterate, because you need to gather both numbers. One approach to this problem may be to use a hash. Something like the following may work: `my %users; while (<CSV>) { next if ($. == 1); if ($csv->parse($_)) { my @columns = $csv->fields(); my @choppedcols = $csv->fields(); my $username = lc($columns[1]); $users{$username}{first_name} = ($columns[2]); $users{$username{last_name} = ($columns[3]); # etc... then my $telephone = ($columns[8]).-($columns[6]); if ($columns[8] == 1) { $users{$username}{fax_number} = $telephone; } else { $users{$username}{phone_number} = $telephone; } } }` [download] And then... `for my $user (keys %users) { print "dn: uid=$user\n", "givenName: $users{$user}{first_name}\n", "sn: $user{$user}{last_name}\n", #etc... "telephonNumber: $users{$user}{phone_number}\n", "faxNumber: $users{$user}{fax_number}\n"; }` [download] Cheers, Darren :)	[reply] [d/l] [select]
Re^4: Manipulating Data by a Field Identifier by rycher (Acolyte) on May 04, 2009 at 04:25 UTC
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Manipulating Data by a Field Identifier by graff (Chancellor) on May 04, 2009 at 05:36 UTC
What is the likelihood that any given field in any given row of your MySQL table contains a comma as data? And what is the likelihood that any given field contains a tab character? Why not use "Tab-Separated-Values" (tsv -- a recognized file type acceptable as input to Excel, etc), so you can skip all the comma / double-quote noise (with the possibility of escaped double-quotes in data)? Why does your source table not have separate fields for voice and fax phone numbers? (That is, why should you need to store two copies of all the other information for a given individual, just so you can manage both a voice and a fax phone number?) And while I'm asking about stuff other than your stated problem, don't you think it's a bit problematic to have two columns with the same label? (You have two columns labeled "username", but these seem to have different values; likewise for two columns labeled "name".) As for answering the particular question you posed, it's only complicated by the fact that the input data is somewhat brain-damaged: apparently, you need to merge two table rows of mostly redundant data so that you can get a single record of output with two different phone numbers (one for voice, one for fax). So, you keep all the fields when the row contains a "0" in the fax_flag field -- load them into a hash keyed by column name (but do something so that you have "username1" and "username2", and "name1" and "name2" as hash keys, or else you will lose information), and put together the value ~~for~~ to be stored with "voice_phone" as a hash key. Then when you get the corresponding row with "1" in the fax_flag field, just add a "fax_number" key/value to the hash, and output the desired result from the hash. (If the rows are not ordered "cooperatively" -- i.e. if the "fax_flag=1" row could be before or long after the corresponding "fax_flag=0" row -- you'll need a HoH, with the first-layer hash keyed by an appropriate unique-ID-per-individual. Output will need to be done when you've processed the 2nd row for a given individual, whichever row that happens to be.)	[reply]
Re: Manipulating Data by a Field Identifier by bichonfrise74 (Vicar) on May 05, 2009 at 01:02 UTC
Consider this... I didn't write the output of all the fields. I just focused on the fax number and telephone number problem. #!/usr/bin/perl use strict; use Data::Dumper; use Text::CSV; my $csv = Text::CSV->new(); my %hash_temp; while (my $line = <DATA>) { next if ( $line =~ /^\"create_stamp\"/ ); $csv->parse( $line ); my @cols = $csv->fields(); $hash_temp{"dn: uid="} = lc( $cols[1] ); $hash_temp{"givenName: "} = $cols[2]; $hash_temp{"sn: "} = $cols[3]; $hash_temp{"telephoneNumber: "} = $cols[8] . "-" . $cols[6] if ( $ +cols[7] == 0 ); $hash_temp{"faxNumber: "} = $cols[8] . "-" . $cols[6] if ( $cols[7 +] == 1 ); } print Dumper( \%hash_temp ); __DATA__ "create_stamp","username","first_name","last_name","name","name","exte +nsion","fax_flag","exchange","position","username" "2009-03-02 15:31:52","SimpsonH","Homer","Simpson","Nuclear Control Ce +nter","Radioctive Hall","6798","1","218 555","Nuclear Control Operato +r","BurnsM" "2009-03-02 15:31:52","SimpsonH","Homer","Simpson","Nuclear Control Ce +nter","Radioctive Hall","6793","0","218 555","Nuclear Control Operato +r","BurnsM" [download]	[reply] [d/l]