Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

does anyone else use Parse::FixedLength;?

by cmilfo (Hermit)
on Jun 21, 2001 at 21:52 UTC ( [id://90474]=perlquestion: print w/replies, xml ) Need Help??

cmilfo has asked for the wisdom of the Perl Monks concerning the following question:

I am using the parse() function from the Parse::FixedLength module inside a while loop. After running through about 80,000 records, the process dies; it's used every bit of the systems 1GB of RAM. I used the Leak module's NoteSV and CheckSV to pinpoint the memory leak. It is indeed in the parse() function from the Parse::FixedLength module. If I stub in substr's to do the parsing instead, it goes away. Has anyone else run into this problem? Is the best course of action to write the author (I've never really found a problem in any of the modules I've used)?

Thank you,
Casey

Edit: chipmunk 2001-06-21

Replies are listed 'Best First'.
Re: does anyone else codeuse Parse::FixedLength;/code?
by runrig (Abbot) on Jun 21, 2001 at 22:44 UTC
    This module does seem to leak. Other than that, it also uses substr when it would be alot more efficient to use pack/unpack. It wouldn't be hard to set up a similar data structure of field names and lengths (and I've done it this way in the past), and convert that to one array of field names, and a format string that unpack can use. Then just do a hash slice assignment like:
    # Say we have an array of field names # and a corresponding one of lengths # (Decide for yourself if you want 'A' or 'a' here) my $format_str = join '', map { "A$_" } @lengths; while (<>) { my %hash; @hash{@names} = unpack($format_str, $_); }
    Update: The source of the 'leak' is that there is a package global array '@parse_record' which gets data pushed to it for every field in every record. I think it was meant to save only one record, and should have been cleared on every new call to parse(). I've /msg'd a bug report to princepawn :-)

    Another update: I mentioned this further down the thread somewhere, but I thought I'd mention it further up here also, that I took princepawn's suggestion to take over the module, rewrote it, and it should be appearing soon at a CPAN near you :-)

      Thank you!
Re (tilly) 1: does anyone else codeuse Parse::FixedLength;/code?
by tilly (Archbishop) on Jun 21, 2001 at 22:50 UTC
    The author of this module is princepawn. Yes, you should contact the author if you have a bug report (and this is definitely a bug report.)

    But I don't happen to believe that you should use a module just because someone saw fit to write it. I am also not really a fan of fixed data records, but that is another story. But if fixed data formats are your need, then I recommend becoming friends with Perl's built-in utilities for that, tools like pack and unpack. Also davorg has a book about this kind of data manipulation which I have not read, but a lot of people seem to like.

    Anyways I would suggest a bug report so the author can fix the module, and in the meantime either debug the module yourself or else work around using the module.

      I switched to the unpack. It solves the memory leak issue and speeds the program up. Sometimes I am in such a reuse mode that I forget the basics.

      Thank you!
Parse::FixedLength - better late than never (tilly read)
by princepawn (Parson) on Jun 22, 2001 at 00:56 UTC
    Imagine my surprise when I saw a reference to my module in Seekers. I read my CPAN mail everyday, but have been busy at work, so I missed this whole thread.

    Thank you for the bug report and yes, in light of Dave Cross' "Data Munging with Perl", a highly useful book, this module should be re-written to use pack/unpack.

    ++'s and credit due

    • bitwise for using Devel::Leak to track down the error
    • runrig for tracking down the inefficiency and the source of the bug
    • tilly for being my devil's advocate
    But if fixed data formats are your need, then I recommend becoming friends with Perl's built-in utilities for that, tools like pack and unpack. Also davorg has a book about this kind of data manipulation which I have not read, but a lot of people seem to like.
    The purpose of this module is to decouple the description of the information parsed from the actual process of parsing it. Pack/unpack are for the actual process of parsing. If the description and process are bound together, then it becomes more difficult for external parse description to be used at will.

    for example, what if you wanted to have data entry operators enter a huge collection of field names and field widths? It is much easier for them to enter these sans Perl syntax.

    Also, certain industry vendors do use fixed-length data. Valley Media, the fulfillment house for amazon.com, cdnow.com, and several other major .coms only receives (for this see Text::FixedLength) and transmits (for this see Parse::FixedLength) fixed length data. Their major competitor, global fulfillment, was using XML, but all of their high-techery did not save them from going out of business.

    So, to summarize:

    • fixed-length data may not be desirable (who likes counting whitespace fields in a big file), but it is probably here to stay, just like VAX computers.
    • I emphasize when data processing with Perl, the following steps:
      1. input processcreate a representation of what is to be parsed in a form both readable and enterable by non-technical types
      2. munge processcreate something which takes the data to be parsed and this representation of what is to be parsed and does the parsing
      3. output processFeed this general data (usually a perl nested data structure) into something which generates output files or SQL statements for data re-storage.
      4. In short, this is very similar to the edict in "Data Munging" : decouple input, munging, and output processes which is decreed in the table of contents of Cross' book. So, I make a strong invitation for you to refute this process of data munging and provide support of another more superior means of data processing in Perl.
      Why do you always state everything as a challenge?

      And why do your challenges always sound like they missed the point?

      But since you go to the effort of putting my name in the title, I will respond honestly.

      The fact that a given kind of decoupling is desirable does not mean that all attempts to do it are created equal. In particular the outline of a design that runrig gave is one that looks a lot better to me. It should be very much faster than, on each record, having to dynamically rebind and rebuild your understanding of the record structure. (OK, so Perl does a dynamic rebinding and building, but that is done at the C level from a much simpler structure.) This is an inherent inefficiency that the API you created does not allow to be addressed.

      Furthermore I don't like having code that I don't trust around. With many module authors I am willing to accept that they will produce code that I will trust. However I don't have that faith in your code. And the fact that - yet again - you were bit by using global variables in an instance where there is no good reason to use a global.

      So we agree that the decoupling that you and Dave Cross talk about is important. Your invitation for me to refute the obvious is wasted. You spend energy trying to say that there is a problem, that a lot of people need to handle fixed data, that you don't want the documentation of your formats to be embedded in low-level parsing routines. Fine. Agreed. Accepted.

      However you say nothing about why I or anyone else should want to use your solution. And that is the real problem here. Granted, the problem is not one I take personally becuse I don't encounter it, and would never create anything new with that issue. But if I needed to solve the problem, I would want a solution I can trust that was simple, fast and flexible. And frankly, you have failed to convince me.

      Oh. Just saw my name being used in this discussion and thought I should clear up a couple of things.

      It's true that Data Munging with Perl promotes the decoupling of the input/munge/output stages of a data munging program. I really hope that no-one here would argue against that being a good idea.

      I'm not sure, however, how that concept suddenly becomes an endorsement for Parse::FixedLength. I admit that I didn't know about the module when I was writing the book, but having now looked at the module I think that it makes things more complex than using plain pack and unpack.

      So, yes, people do still have to deal with fixed width data (I'm currently working with data from IBM, so I should know!) but in my opinion using Perl built-ins is a better way of dealing with it than adding an new module into the process.

      --
      <http://www.dave.org.uk>

      Perl Training in the UK <http://www.iterative-software.com>

      A reply falls below the community's threshold of quality. You may see it by logging in.
      I agree with PrincePawn regarding separation of description and process here. In fact, I wrote a similar module before his was posted, so I use it instead :) The goals are similar, while the features are skewed toward handling text going in/out of mainframe COBOL programs.

      Oh, and it uses pack/unpack. But it's not as complete or as well documented. Guess I'll have to pretty mine up now!

      I actually wouldn't mind using a module like Parse::FixedLength because in my experience I have had to deal alot with fixed length records, but, like tilly says, I wouldn't rebind everything on every record.

      I'd probably use an OO approach, passing in the info and creating the neccessary shortcut data structures on a (maybe) a new() method, and then use a parse() method for every record using unpack behind the scenes like in my earlier example.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://90474]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-04-19 13:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found