Hello,

I recently had the opportunity to work with text files containing fixed-width data. I wrote a function to implement this, and I'm wondering if people would find a module helpful.

Some sample usage:

use Text::FixedWidth; my @fields = ( { field => name, from => 1, to => 5 }, { field => email, from => 8, to => 25 }, ); my $fw = Text::FixedWidth->new( fields => \@fields, ); $data_ref = $fw->parsefile('data.txt');

$data_ref would look like:

$data_ref = [ { name => 'Bob', email => 'bob@email.com' }, { name => 'Sam', email => 'sam@mail.com' }, ];

It basically parses each line of the file into a hashref with column names as keys. You need to define your column names and their from and to positions in the line, or alternately specify the width directly.

If interested, I can supply more info, but the above code mostly says whats in my head right now.

Thanks!

Replies are listed 'Best First'.
Re: RFC: Text::FixedWidth
by friedo (Prior) on Jul 25, 2005 at 22:39 UTC
Re: RFC: Text::FixedWidth
by runrig (Abbot) on Jul 25, 2005 at 23:22 UTC

    I tried to list all the existing redundant modules at the end of the P:FL documentation. Your module does basically the same as these, the only difference being allowing specifying the 'from' and 'to' arguments, and the returning of an arrayref of all the records. This could be done with small additions to the existing modules (e.g. wrapping the 'new' method in P::FL for the from/to, and wrapping the 'parse' method to open a file and return an arrayref).

    And re your comment above about the P::FL docs, yeah, there's alot of it, but most can just be ignored, and if you just stick to the basic stuff, then it's pretty straightforward IMHO. Patches welcome though, for making things more clear, or adding more methods. :-) I can't say that you absolutely shouldn't upload your module, but if I knew about what already existed, I don't think I would have bothered to take over and rewrite P::FL.

      This could be done with small additions to the existing modules (e.g. wrapping the 'new' method in P::FL for the from/to, and wrapping the 'parse' method to open a file and return an arrayref)

      Maybe a Parse::FixedLength::Simple module would be apt here. :-)

Re: RFC: Text::FixedWidth
by NetWallah (Canon) on Jul 26, 2005 at 17:06 UTC
    One of the things I detest about fixed-length field parsers is that they require me to count characters (Either field width, or , worse, Start and end columns).

    What I'd like to see is something that lets me paste-in a data sample, and mark field boundries using special characters, then allows me to name them. Something like:

    # Sample Data ---- # FirstName Lastname YearBorn LastHaircutDate # I'd like to specify the above as: my $fields= " [irstName ][astname ][earBorn ][astHaircutDate ]"; my @fieldnames=qw( FirstName Lastname YearBorn LastHaircu +tDate); #Alternatively, allow optional TYPE specification with mixed hashrefs +and strings: my @fieldnames=( 'FirstName', 'Lastname', {name=>'YearBorn', +type='integer'}, {name='LastHaircutDate',type='date'});
    I also like the idea of specifying the info using XML.

         "Income tax returns are the most imaginative fiction being written today." -- Herman Wouk

      My experience has been that most of the time, I would be given a document that specified the format in a "name length" or "name length start end" style, and I would cut and paste the document into Vim, do a couple of transformations, and have my module-ready format, which is why the format in the existing modules has worked for me 99% of the time. I can see the usefulness of your method, though, and it would be easy enough to preprocess your format into a something suitable for one of the modules, or just generate a pack/unpack format string (the only problem with your first idea that I can see is with fields that have names longer than their lengths).
Re: RFC: Text::FixedWidth
by pg (Canon) on Jul 26, 2005 at 07:22 UTC

    I will find it very useful and would like to have it, if you do two things:

    1. Allow me to specify file format through XML, not coding.
    2. Provide a simpler interface than those existing modules. All what I expect is really simple - be able to parse the file automatically into certain data structure, don't make it too complex. Excessive functionality is not always better, and might easily break modulization.
      Allow me to specify file format through XML, not coding.

      You mean something like:

      use XML::Simple; use Parse::FixedLength; my $x = XMLin(<<XMLEND); <format> <field> <fname>fld1</fname> <length>10</length> </field> <field> <fname>fld2</fname> <length>20</length> </field> <field> <fname>fld3</fname> <length>30</length> </field> </format> XMLEND my $fmt = [ map { "$_->{fname}:$_->{length}" } @{$x->{field}} ]; my $p = Parse::FixedLength->new($fmt);

      Somehow I doubt you mean this, but you gotta start somewhere...(and I'm not clear on how XML makes this any better)