Open file, sort data on matching field, close file

rycher has asked for the wisdom of the Perl Monks concerning the following question:

Dear PerlMonks,

I have one file containing employee data that has been populated by different MySQL exported table files from one database.
How would I sort the data in this merged file to group together all relevant?
For example(the three dots signify x# of spaces between entries):

Infile:
----------------
employee_id: 227
position_id: 182
departmentNumber: 38
id: 67
manager: 222
...

uid: SimpsonH
employee_id: 227
sn: Simpson
givenName: Homer
...

extension: 5555
employee_id: 227
telephonenumber: 218 344
fax: 0
phone_id: 652
...

position_id: 182
title: Nuclear Plant Operator
...

uid: BurnsM
employee_id: 222
sn: Burns
givenName: Montgomery Burns
...

name: Nuclear Resources
departmentNumber: 38
buildingName: 2
...

buildingName: 2
name: Radioactive Hall
...

buildingName: 2
roomNumber: 7
location_id: 6
...

roomNumber: 7
room: ControlTower


Outfile:
---------------------
uid: simpsonh
sn: Simpson
givenName: Homer
manager: burnsm
telephoneNumber: 2183445555
departmentNumber: 38
title: Nuclear Plant Operator
roomNumber: 7
buildingName: Nuclear Resources
[download]

I am not even sure where to start at this point. There is a pattern with the *_id categories that I could key in on and yank the following line but I'm unsure on how to go about doing that... please help!

Sincerely,

rycher

Comment on Open file, sort data on matching field, close file Download Code

Replies are listed 'Best First'.
Re: Open file, sort data on matching field, close file by mr_mischief (Monsignor) on Apr 29, 2009 at 20:00 UTC
First you need to figure out how to parse those records out as records, and which type of record each one is. Then, a hash of hashes (HoH) of the records makes sense to organize them. Something like the below makes sense to me as your data structure. Keying the outside hash to the common key for your tables allows you to add data to the anonymous hashes inside as you find new records for an employee, rather than having to put an employee record together completely before storing it in the data structure. Then, just walk over the outside hash and output the key and the data from inside the anonymous hash for each employee in turn. `@employee = ( '227' => { 'uid' => 'simpsonh', 'sn' => 'Simpson', 'givenname' => 'Homer', 'manager' => 'burnsm', 'telephoneNumber' => '2183445555', 'departmentNumber' => 38, 'title' => 'Nuclear Plant Operator', 'roomNumber' => 7, 'buildingName' => 'Nuclear Resources' }, 222 => { # rest of entry info here } );` [download]	[reply] [d/l]
Re: Open file, sort data on matching field, close file by apl (Monsignor) on Apr 29, 2009 at 18:55 UTC
employee_id 227 appears to be reused, which is odd, so you can't use that feld. What do you want to sort on? That would indicate how to store the data when you load it. In any event, use a hash keyed on whatever field you decide, and use the Perl `sorted keys` idiom when you loop through the structure.	[reply] [d/l]
Re: Open file, sort data on matching field, close file by Plankton (Vicar) on Apr 29, 2009 at 20:05 UTC
Have you tried or able to have the data sorted via SQL before it is exported?	[reply]
Re^2: Open file, sort data on matching field, close file by rycher (Acolyte) on May 01, 2009 at 01:55 UTC
OK, apparently I am going to have to re-write most of the script now because of your suggestion. This is a good thing though because I was able to get the MySQL data into a nicely formatted, comma-delimited csv file. This is the select statement I used, if anyone is wondering: SELECT Employee.username, Employee.first_name, Employee.last_name, Dep +artment.name, LocationBuilding.name, PhoneNumbers.phone_id, PhoneNumb +ers.fax_flag, PhoneNumbers.exchange, P.position, OrgChart.supervisor_ +id FROM P INNER JOIN OrgChart ON(P.employee_id=OrgChart.employee_id) +INNER JOIN Employee ON(OrgChart.employee_id=Employee.employee_id) INN +ER JOIN PhoneNumbers ON(Employee.employee_id=PhoneNumbers.employee_id +) INNER JOIN Department ON(OrgChart.department_id=Department.departme +nt_id) INNER JOIN LocationBuilding ON(Department.building_id=Location +Building.building_id) INNER JOIN Location ON(LocationBuilding.buildin +g_id=Location.building_id) INNER JOIN LocationRooms ON(Location.room_ +id=LocationRooms.room_id);" \| sed 's/\t/","/g;s/^/"/;s/$/"/;s/\n//g' +> Employee.csv [download] Thanks!	[reply] [d/l]
Re: Open file, sort data on matching field, close file by punch_card_don (Curate) on Apr 30, 2009 at 12:27 UTC
This will likely have to be a multi-pass process. It appears that you have a well known set of of fields that appear to order nicely into sql-like tables: employee_info managers buildings locations positions rooms departments And you know, it appears, that data grouped on successive lines relate to each other and that groupings are separated by blank lines, and that data tags and their value are spearated by a colon. So, go through the file line by line, read data into a buffer by reading forward until you hit a blank line, then store that into temporary hashes as if in sql tables. Once it's all compiled, then output your total file. pseudo-code: open FILE, $infile; @lines = <FILE>; close FILE; for $i (0 .. $#lines) { if ($lines($i) ne "") { ($tag, $value) = split(/:/, $lines($i)); $temp{$tag} = $value; } else { #reached end of a data grouping, now process it if exists $temp{'employee_id'} { foreach $field (keys %temp) { $employee_info{$field} = $temp{$field}; } } elseif (exists $temp{'position_id') && exists $temp{'title'}) +{ $positions{$temp{'position_id')} = $temp{'title'}; } elseif { ....etc... } ...etc... } ...clear %temp... } foreach $employee (keys %employee_info) { output info, drawing labels from %positions, %departments, etc... } [download] Time flies like an arrow. Fruit flies like a banana.	[reply] [d/l]
Re: Open file, sort data on matching field, close file by CountZero (Bishop) on Apr 30, 2009 at 15:41 UTC
As soon as you have found where one record stops and the next record starts, it becomes trivially easy to build a Array of Hashes or a Hash of Hashes to hold your data. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply]