sch has asked for the wisdom of the Perl Monks concerning the following question:

I'm throwing together a script to do some file processing. The basic idea is that the script will handle a set of files, processing each file in turn.

As it looks at each file, it should look at the records in the file, and process particular sections of the record based on their character position.

I'm fairly happy with most of the above, but what I'm trying to do is build a data structure so that I can make the script as flexible as possible.

I have a feeling that what I'm looking for is a hash of arrays of arrays - something like:

filename1 => [field_name1, start_position, length ], [field_name2, start_position, length ], .. .. .. [field_namen, start_position, length ] filename2 => [field_name1, start_position, length ], [field_name2, start_position, length ], .. .. .. [field_namen, start_position, length ]

My problem is that I just can't see how to code that structure up. I'd prefer to use pushes, but I can't come up with a syntax that works. Of course, I might just be going about this all wrong, but I'd welcome any suggestions.

Thanks, Simon

Replies are listed 'Best First'.
Re: Modelling a data structure
by blue_cowdawg (Monsignor) on Feb 07, 2007 at 16:32 UTC
        My problem is that I just can't see how to code that structure up.

    If I understand what you are trying to do one way I would approach this would be:

    my $files = { filename1 => { field1 => { start_position => 10, length => 20 }, # values as appropriate field2 => { start_position => 20 length => 5 } | etc. | }, filename2 => { | and so on... } }
    that way you can access the individual members such as:
    foreach my $file ( keys %$files ){ foreach my $field ( keys %{$files->{$file}}{ | do something here |
    or some such similar.

    My preferred approach would be object oriented in nature where I'd create a module similar to this:

    package MyDataFile; sub new { my $p = shift; my $class="MyDataFile"; my $self = { fields => { }, filename => undef }; bless $self, $class; return $self; } sub filename { my $self = shift; if ( $_[0] ) { $self->{filename} = shift; } return $self->{filename}; } sub addField { my ($self,$field_name,$start,$length)=@_; $self->{fields} = { fieldname=>$field_name, start=> $start, length=> $length }; } sub getField { my ($self,$fieldname)=@_; return $self->{fields}->{$fieldname}; } sub getAllFieldnames { my $self=shift; return keys %{$self->{fields}}; } 1;

    There are other methods I might add, and I might be tempted as well to create a module defining a structure descriping the field attributes as well, but that's way too complicated for the sake of this discussion for now.

    Given the module I just gave you, you could do something like this in your code now:

    use strict; use MyDataFile; my @files=(); | mumble... | my $df = new MyDataFile; $df->filename('foo.blah'); $df->addField('snarf',5,20); | etc. |

    Hope this is food for thought.


    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg

      That's exactly what I was looking for :) Thanks!!!

      I did consider going OO, but given that I've only got limited time to get this together, I'm going to stick with the first part of your solution.

      Thanks, Simon

Re: Modelling a data structure
by Herkum (Parson) on Feb 07, 2007 at 16:20 UTC

    I think, for the moment you need to step back and design how you are going to use this data.

    Figure out how you are going to work with it first. If force the data into some sort of structure without understanding how you are going to use, you will just make problems for yourself.

    So to start, give us a sample of code that shows HOW you intend to use the data. Then will be easier for people to show you how to manipulate your data to match your interface.

Re: Modelling a data structure
by CountZero (Bishop) on Feb 07, 2007 at 16:38 UTC
    What you need is a configuration file to ehh ... configure your program.

    Something like YAML seems well-suited for your means. The configuration in a YAML file will translate directly to your datastructure.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      Plan B was going to be to go with an XML config file, but given that once this is setup it won't get changed very often.

      But thanks for the suggestion.

      Simon

        Plan B was going to be to go with an XML config file...

        You mean something like this?

        $ cat test.xml <?xml version="1.0"?> <filelist> <file name="file1"> <field name="field1" start="50" length="20"/> <field name="field2" start="80" length="20"/> <field name="field3" start="100" length="20"/> </file> <file name="file2"> <field name="field1" start="52" length="22"/> <field name="field2" start="82" length="22"/> <field name="field3" start="120" length="22"/> </file> <file name="file3"> <field name="field1" start="53" length="23"/> <field name="field2" start="83" length="23"/> <field name="field3" start="130" length="23"/> </file> </filelist> $ perl -MXML::Simple -MData::Dumper -e '$x=XMLin("test.xml"); print Du +mper($x)' $VAR1 = { 'file' => { 'file2' => { 'field' => { 'field1' => { 'length' => '22' +, 'start' => '52' }, 'field2' => { 'length' => '22' +, 'start' => '82' }, 'field3' => { 'length' => '22' +, 'start' => '120' } } }, 'file1' => { 'field' => { 'field1' => { 'length' => '20' +, 'start' => '50' }, 'field2' => { 'length' => '20' +, 'start' => '80' }, 'field3' => { 'length' => '20' +, 'start' => '100' } } }, 'file3' => { 'field' => { 'field1' => { 'length' => '23' +, 'start' => '53' }, 'field2' => { 'length' => '23' +, 'start' => '83' }, 'field3' => { 'length' => '23' +, 'start' => '130' } } } } };
        But... what? Looks pretty simple to me. I'd rather not have all those data values hard-coded in the perl script itself, and reading them from a basic xml file seems like a no-brainer.

        I guess it's a little goofy that there are two layers in XML::Simple's "default" hash structure that are sort of useless ("file" and "field"), but if you tried a different way of structuring the xml (and/or spent some time with the XML::Simple man page), you could probably improve on that. (Or you could just use the default as-is and get done quicker.)