cmilfo has asked for the wisdom of the Perl Monks concerning the following question:
I am using the parse() function from the Parse::FixedLength module inside a while loop. After running through about 80,000 records, the process dies; it's used every bit of the systems 1GB of RAM. I used the Leak module's NoteSV and CheckSV to pinpoint the memory leak. It is indeed in the parse() function from the Parse::FixedLength module. If I stub in substr's to do the parsing instead, it goes away. Has anyone else run into this problem? Is the best course of action to write the author (I've never really found a problem in any of the modules I've used)?
Thank you,
Casey
Edit: chipmunk 2001-06-21
Re: does anyone else codeuse Parse::FixedLength;/code?
by runrig (Abbot) on Jun 21, 2001 at 22:44 UTC
|
This module does seem to leak. Other than that, it also uses substr when it would be alot more efficient to use pack/unpack. It wouldn't be hard to set up a similar data structure of field names and lengths (and I've done it this way in the past), and convert that to one array of field names, and a format string that unpack can use. Then just do a hash slice assignment like: # Say we have an array of field names
# and a corresponding one of lengths
# (Decide for yourself if you want 'A' or 'a' here)
my $format_str = join '', map { "A$_" } @lengths;
while (<>) {
my %hash;
@hash{@names} = unpack($format_str, $_);
}
Update: The source of the 'leak' is that there is a package global array '@parse_record' which gets data pushed to it for every field in every record. I think it was meant to save only one record, and should have been cleared on every new call to parse(). I've /msg'd a bug report to princepawn :-)
Another update: I mentioned this further down the thread somewhere, but I thought I'd mention it further up here also, that I took princepawn's suggestion to take over the module, rewrote it, and it should be appearing soon at a CPAN near you :-) | [reply] [d/l] |
|
| [reply] |
Re (tilly) 1: does anyone else codeuse Parse::FixedLength;/code?
by tilly (Archbishop) on Jun 21, 2001 at 22:50 UTC
|
The author of this module is princepawn. Yes, you should contact the author if you have a bug report (and this is definitely a bug report.)
But I don't happen to believe that you should use a module just because someone saw fit to write it. I am also not really a fan of fixed data records, but that is another story. But if fixed data formats are your need, then I recommend becoming friends with Perl's built-in utilities for that, tools like pack and unpack. Also davorg has a book about this kind of data manipulation which I have not read, but a lot of people seem to like.
Anyways I would suggest a bug report so the author can fix the module, and in the meantime either debug the module yourself or else work around using the module. | [reply] |
|
I switched to the unpack. It solves the memory leak issue and speeds the program up. Sometimes I am in such a reuse mode that I forget the basics.
Thank you!
| [reply] |
Parse::FixedLength - better late than never (tilly read)
by princepawn (Parson) on Jun 22, 2001 at 00:56 UTC
|
Imagine my surprise when I saw a reference to
my module in Seekers. I read my CPAN mail
everyday, but have been busy at work, so I missed
this whole thread.
Thank you for the bug report and yes, in light of
Dave Cross' "Data Munging with Perl", a highly
useful book, this module should be
re-written to use pack/unpack.
++'s and credit due
- bitwise for using Devel::Leak to track down the error
- runrig for tracking down the inefficiency and the
source of the bug
- tilly for being my devil's advocate
But if fixed data formats are your need, then I recommend becoming friends with Perl's built-in utilities for that, tools like pack and unpack. Also davorg has a book about this kind of data manipulation which I have not read, but a lot of people seem to like.
The purpose of this module is to decouple the description of the
information parsed from the actual process of parsing it. Pack/unpack are for the actual process of parsing. If the description and process are bound together, then it becomes more difficult for external parse description to be used at will.
for example, what if you wanted to have data entry operators enter a huge collection of field names and field widths? It is much easier for them to enter these sans Perl syntax.
Also, certain industry vendors do use fixed-length data.
Valley Media, the fulfillment house for amazon.com,
cdnow.com, and several other major .coms only
receives (for this see Text::FixedLength) and
transmits (for this see Parse::FixedLength)
fixed length data. Their major competitor, global fulfillment, was using XML, but all of their high-techery did not save them from going out of business.
So, to summarize:
- fixed-length data may not be desirable (who likes
counting whitespace fields in a big file), but it is probably here to stay, just like VAX computers.
- I emphasize when data processing with Perl, the following steps:
- input processcreate a representation of what is to be parsed in a form both readable and enterable by non-technical types
- munge processcreate something which takes the data to be parsed and this representation of what is to be parsed and does the parsing
- output processFeed this general data (usually a perl nested data structure) into something which generates output files or SQL statements for data re-storage.
- In short, this is very similar to the edict in "Data Munging" : decouple input, munging, and output processes which is decreed in the table of contents of Cross' book. So, I make a strong invitation for you to refute this process of data munging and provide support of another more superior means of data processing in Perl.
| [reply] |
|
Why do you always state everything as a challenge?
And why do your challenges always sound like they missed the point?
But since you go to the effort of putting my name in the title, I will respond honestly.
The fact that a given kind of decoupling is desirable does not mean that all attempts to do it are created equal. In particular the outline of a design that runrig gave is one that looks a lot better to me. It should be very much faster than, on each record, having to dynamically rebind and rebuild your understanding of the record structure. (OK, so Perl does a dynamic rebinding and building, but that is done at the C level from a much simpler structure.) This is an inherent inefficiency that the API you created does not allow to be addressed.
Furthermore I don't like having code that I don't trust around. With many module authors I am willing to accept that they will produce code that I will trust. However I don't have that faith in your code. And the fact that - yet again - you were bit by using global variables in an instance where there is no good reason to use a global.
So we agree that the decoupling that you and Dave Cross talk about is important. Your invitation for me to refute the obvious is wasted. You spend energy trying to say that there is a problem, that a lot of people need to handle fixed data, that you don't want the documentation of your formats to be embedded in low-level parsing routines. Fine. Agreed. Accepted.
However you say nothing about why I or anyone else should want to use your solution. And that is the real problem here. Granted, the problem is not one I take personally becuse I don't encounter it, and would never create anything new with that issue. But if I needed to solve the problem, I would want a solution I can trust that was simple, fast and flexible. And frankly, you have failed to convince me.
| [reply] |
|
Oh. Just saw my name being used in this discussion and
thought I should clear up a couple of things.
It's true that Data Munging with Perl promotes
the decoupling of the input/munge/output stages of a data
munging program. I really hope that no-one here would argue
against that being a good idea.
I'm not sure, however, how that concept suddenly becomes
an endorsement for Parse::FixedLength. I admit that I didn't know about the module when I was writing the book,
but having now looked at the module I think that it makes
things more complex than using plain pack and
unpack.
So, yes, people do still have to deal with fixed
width data (I'm currently working with data from IBM, so I
should know!) but in my opinion using Perl built-ins is a
better way of dealing with it than adding an new module
into the process.
--
<http://www.dave.org.uk>
Perl Training in the UK <http://www.iterative-software.com>
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in.
|
|
| [reply] |
|
I actually wouldn't mind using a module like Parse::FixedLength because
in my experience I have had to deal alot with fixed length records,
but, like tilly says, I wouldn't rebind everything on every record.
I'd probably use an OO approach, passing in the info and creating
the neccessary shortcut data structures on a (maybe) a new() method, and then use
a parse() method for every record using unpack behind the scenes like in my earlier example.
| [reply] |
|
|