rycher has asked for the wisdom of the Perl Monks concerning the following question:

Dear PerlMonks,

I have one file containing employee data that has been populated by different MySQL exported table files from one database.
How would I sort the data in this merged file to group together all relevant?
For example(the three dots signify x# of spaces between entries):

Infile: ---------------- employee_id: 227 position_id: 182 departmentNumber: 38 id: 67 manager: 222 ... uid: SimpsonH employee_id: 227 sn: Simpson givenName: Homer ... extension: 5555 employee_id: 227 telephonenumber: 218 344 fax: 0 phone_id: 652 ... position_id: 182 title: Nuclear Plant Operator ... uid: BurnsM employee_id: 222 sn: Burns givenName: Montgomery Burns ... name: Nuclear Resources departmentNumber: 38 buildingName: 2 ... buildingName: 2 name: Radioactive Hall ... buildingName: 2 roomNumber: 7 location_id: 6 ... roomNumber: 7 room: ControlTower Outfile: --------------------- uid: simpsonh sn: Simpson givenName: Homer manager: burnsm telephoneNumber: 2183445555 departmentNumber: 38 title: Nuclear Plant Operator roomNumber: 7 buildingName: Nuclear Resources

I am not even sure where to start at this point. There is a pattern with the *_id categories that I could key in on and yank the following line but I'm unsure on how to go about doing that... please help!

Sincerely,

rycher

Replies are listed 'Best First'.
Re: Open file, sort data on matching field, close file
by mr_mischief (Monsignor) on Apr 29, 2009 at 20:00 UTC
    First you need to figure out how to parse those records out as records, and which type of record each one is. Then, a hash of hashes (HoH) of the records makes sense to organize them.

    Something like the below makes sense to me as your data structure. Keying the outside hash to the common key for your tables allows you to add data to the anonymous hashes inside as you find new records for an employee, rather than having to put an employee record together completely before storing it in the data structure. Then, just walk over the outside hash and output the key and the data from inside the anonymous hash for each employee in turn.

    @employee = ( '227' => { 'uid' => 'simpsonh', 'sn' => 'Simpson', 'givenname' => 'Homer', 'manager' => 'burnsm', 'telephoneNumber' => '2183445555', 'departmentNumber' => 38, 'title' => 'Nuclear Plant Operator', 'roomNumber' => 7, 'buildingName' => 'Nuclear Resources' }, 222 => { # rest of entry info here } );
Re: Open file, sort data on matching field, close file
by apl (Monsignor) on Apr 29, 2009 at 18:55 UTC
    employee_id 227 appears to be reused, which is odd, so you can't use that feld.

    What do you want to sort on? That would indicate how to store the data when you load it.

    In any event, use a hash keyed on whatever field you decide, and use the Perl sorted keys idiom when you loop through the structure.

Re: Open file, sort data on matching field, close file
by Plankton (Vicar) on Apr 29, 2009 at 20:05 UTC
    Have you tried or able to have the data sorted via SQL before it is exported?
      OK, apparently I am going to have to re-write most of the script now because of your suggestion. This is a good thing though because I was able to get the MySQL data into a nicely formatted, comma-delimited csv file.

      This is the select statement I used, if anyone is wondering:

      SELECT Employee.username, Employee.first_name, Employee.last_name, Dep +artment.name, LocationBuilding.name, PhoneNumbers.phone_id, PhoneNumb +ers.fax_flag, PhoneNumbers.exchange, P.position, OrgChart.supervisor_ +id FROM P INNER JOIN OrgChart ON(P.employee_id=OrgChart.employee_id) +INNER JOIN Employee ON(OrgChart.employee_id=Employee.employee_id) INN +ER JOIN PhoneNumbers ON(Employee.employee_id=PhoneNumbers.employee_id +) INNER JOIN Department ON(OrgChart.department_id=Department.departme +nt_id) INNER JOIN LocationBuilding ON(Department.building_id=Location +Building.building_id) INNER JOIN Location ON(LocationBuilding.buildin +g_id=Location.building_id) INNER JOIN LocationRooms ON(Location.room_ +id=LocationRooms.room_id);" | sed 's/\t/","/g;s/^/"/;s/$/"/;s/\n//g' +> Employee.csv

      Thanks!

Re: Open file, sort data on matching field, close file
by punch_card_don (Curate) on Apr 30, 2009 at 12:27 UTC
    This will likely have to be a multi-pass process. It appears that you have a well known set of of fields that appear to order nicely into sql-like tables:

    employee_info
    managers
    buildings
    locations
    positions
    rooms
    departments

    And you know, it appears, that data grouped on successive lines relate to each other and that groupings are separated by blank lines, and that data tags and their value are spearated by a colon. So, go through the file line by line, read data into a buffer by reading forward until you hit a blank line, then store that into temporary hashes as if in sql tables. Once it's all compiled, then output your total file.

    pseudo-code:

    open FILE, $infile; @lines = <FILE>; close FILE; for $i (0 .. $#lines) { if ($lines($i) ne "") { ($tag, $value) = split(/:/, $lines($i)); $temp{$tag} = $value; } else { #reached end of a data grouping, now process it if exists $temp{'employee_id'} { foreach $field (keys %temp) { $employee_info{$field} = $temp{$field}; } } elseif (exists $temp{'position_id') && exists $temp{'title'}) +{ $positions{$temp{'position_id')} = $temp{'title'}; } elseif { ....etc... } ...etc... } ...clear %temp... } foreach $employee (keys %employee_info) { output info, drawing labels from %positions, %departments, etc... }




    Time flies like an arrow. Fruit flies like a banana.
Re: Open file, sort data on matching field, close file
by CountZero (Bishop) on Apr 30, 2009 at 15:41 UTC
    As soon as you have found where one record stops and the next record starts, it becomes trivially easy to build a Array of Hashes or a Hash of Hashes to hold your data.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James