Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Here's the scenario. I've got two csv files. My goal is to use the field values in one file and compare them with the values in the other file. Should I first get the values from each list and put them into an array? Second, one of files will have multiple entries of the other file's value. It's a one to many relationship. What's the best way to sort through this and print out the data I need.

Replies are listed 'Best First'.
Re: comparing array values or?
by AgentM (Curate) on Mar 14, 2001 at 02:19 UTC
    Before you start beating yourself up with regexes and splits, try DBD::CSV. It allows you to use SQL queries to retrieve the data for an easier time data-munging. After you retrieve the data sets, comparing them is trivial.
    AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.
      For now we're only using flat files until we build a database. At this point I have no choice but to use flat files. That's why I'm doing it this way.

        Please consider AgentM's suggestion again -- DBD::CSV (and also, DBD::RAM) allow you to treat CSV files (and other formats) as a database -- and later when you do move to a real database you'll be able to move the scripts with very little effort as well.

Re: comparing array values or?
by Chady (Priest) on Mar 14, 2001 at 01:56 UTC

    always vague questions are posted by some people

    Here's a scenario. you're half way through it..
    I assume that only one file has the multiple values of another.
    you can get the lists from the first file, and stuff them into an array.. then you take the values of the second file and put into a hash of arrays and do your matching/comparing

    I can't be more specific without some details, like :

    1. how is your file formatted?
    2. an example of the files/values..
    3. some code to begin with.

    Chady | http://chady.net/
      Sorry about the vagueness, I tend to do that when I'm new to a task. To answer your questions:


      1. both files are comma delimited files.
      2. file1 fields= id,name,type
      file1 values= 1,yeates,scarborough
      2,wayman,freedom
      3,xena,princess

      file2 fields= name,ref#,organization,type,box#
      file2 values= wayman,34,meade,B1,4
      wayman,56,gs,B2,7
      wayman,78,nine,B3,8
      yeates,52,sample,A1,9
      xena,63,tv,C1,7
      xena,22,media,C2,2

      3. Right now the only code I have is reading the files into a string. That's all I've got because I want to figure out the best way to do this. File1 will be very small..100k. File2 will be larger than 40megs.

      Does this clarify some vagueness?

      Thanks.