http://qs1969.pair.com?node_id=247806

As I make my daily treks to the gates to learn/help where I can, I have noticed something...The majority of perl users/developers use perl for about everything except what the language letters stand for Practical Extraction and Reporting Language. We have entire websites run by perl, database applications, automation sequences, obfuscation , data munging, whatever.. I say all of that to bring this thought...Because Perl is so flexible, and can do just about anything we need it to, and a lot of us are most comfortable when coding in perl, does it necessarily make perl the end-all-be-all language. What forces us to NOT use perl? Another note I have is this...many of us have 'standard' stuff that we do everyday in perl, or a standard way of reading in files, or doing this or doing that, and some of it is probably a LOT more efficient than the way that I currently do stuff...and it is stuff that you can't necessarily put in a 'module' to share with the world, its just part of your coding nature.. So as to not open flame wars, or pandora's box, or anyother bad nasty thing...I thought maybe we could share with each other our 'best practices'..Obviously this couldn't be done at the language level, otherwise it would get really hairy, and wouldn't be of much use... So how about data munging...

take me for example...when reading in a fixed width file, I ALWAYS use pack and unpack instead of substr...why? because substr is SLOW, and I can parse the entire record at once instead of one field at a time.


When tie'ing hashes to a file with DB_File, I always use $DB_BTREE instead of $DB_HASH, because the search time to hit the hash seems to be faster when dealing with multi-million record files. I could be totally wrong and not know it too..

what else...
is for faster than foreach? dunno, never benchmarked it...anyone know?
is
my $num_elements=@array;
fasther than

my $num_elements=$#array;


As you can see I am looking for simple stuff here...nothing funky that is hard to interpret, just perl basics that we take forgranted sometimes in thinking that everyone knows about them.


Thanks in advance Robert

Replies are listed 'Best First'.
Re: Reporting
by dragonchild (Archbishop) on Apr 03, 2003 at 18:13 UTC
    The big reason, in my mind, why we use Perl for so many things other than Reporting is that we are still doing reporting. Every single Perl application I've ever written did the following:
    1. Read in A
    2. Transform A -> B
    3. Write out B
    I might have 1 or 10 data instreams and 1 or 10 data outstreams. These streams might be files, sockets (like Apache), databases, or whatever. In the process of transformation, I might hide a lot of the details in objects and/or modules. But, every application I have ever worked on is a glorified munger, and Perl is the best language for munging.

    This is why Perl has the easiest time adding new I/O formats to existing apps. DBI, CGI, HTML::Template or Mason, PDF::Template, Spreadsheet::, XML::, and on and on. All these I/O layers mean we can focus on the munging part without having to worry about how the stuff we're munging goes in or out.

    As for best practices ... I dunno. Just don't be stupid and you'll be fine. It doesn't have to be perfect - good enough is, well, good enough.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

      So you've never written a JAPH, then? :-)
Re: Reporting
by perrin (Chancellor) on Apr 03, 2003 at 17:47 UTC
    Actually, for is exactly the same as foreach. They are synonyms. However, C-style for loops with an interator variable are slower than foreach/for looping over a list.

    In general, you'll find that most of the time in most programs is spent on I/O tasks like reading the disk or talking to a database. Small differences in other areas tend to get lost in the noise.

Re: Reporting
by chromatic (Archbishop) on Apr 03, 2003 at 20:47 UTC

    My current favorite practice is do only the most important thing now. That's usually "make it work", not "make it fast". It's much easier, for me, to optimize a program that produces the correct output than to fix a very fast program that produces incorrect output.

Re: Reporting
by dga (Hermit) on Apr 03, 2003 at 18:16 UTC

    I use perl for reporting every day.

    It reads from the database, pops the data into HTML::Template and then fires it off to the waiting screen.

    It is also used for bunches of other stuff also...

      I don't know if you can call "reporting" every Perl program that produces output. I think that in original context, the author meant the long-forgotten (by most Perl programmers, AFAIK) use of formats.
Re: Reporting
by kelan (Deacon) on Apr 03, 2003 at 18:28 UTC

    my $num_elements = @array; is not the same as my $num_elements = $#array; The second will give you one less than the length because it actually returns the index of the last element.

    kelan


    Perl6 Grammar Student

      I think the question was more of which of the following is faster:
      if ($#array >= 0) ### or ... if (@array > 0)

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

        Both of those are better written as if (@array) {. Its likely to be less ops as well.

        Im actually thinking the question was.

        Which is faster?
        $num = scalar(@array); # or $num = $#array;
        And as someone pointed out $#array isnt acurate as to the actual number of elements in the array, so I guess you could simply compare the op trees of the two. Or I guess perform 3 benchmarks,
        1) plain
        2) scalar(@array) - 1
        3) $#array + 1;

        /* And the Creator, against his better judgement, wrote man.c */
Re: Reporting
by Anonymous Monk on Apr 03, 2003 at 22:48 UTC

    I hate posts I have to read twice, so I'll just comment on the very few things I picked up the first time over.

    1. It doesn't matter what the language stands for. What does "Python" or "Ruby" stand for? Why aren't they really a python and a ruby respectively instead of programming languages?!? Read up on the history of naming Perl if you actually care for some strange reason.
    2. Perl does extract and report. Take your database driven app. It extracts data from a database and reports it in HTML form. This can be applied to everything.
    3. Who cares about speed? Optimize once you need it, not before. If Perl's too slow, use something else. There is no debate here.
    4. If you can efficiently do almost everything you want to do in Perl you should acquire higher goals. Perl isn't anywhere near the perfect language for everything, nor will it ever be. <insert type="drivel" category="useBestToolForJob" />

    yeah, that's all.

Re: Reporting
by Anonymous Monk on Apr 04, 2003 at 03:49 UTC
    According to the authors of Berkeley DB a BTree is usually better than hashing for large datasets because you get better locality of reference. Hashing is horrible on caches and virtually guarantees that you hit disk. With a BTree on most applications, most of the time it is in cache.

    So you don't seem to be totally wrong on that.

Re: Reporting
by aquarium (Curate) on Apr 04, 2003 at 11:27 UTC
    Who said that "reporting" is the most important word..could "extraction" be as or more important? I know that I'd much rather use perl to "extract" from data than use C's standard string library. On the ideology of: input, munge, output...isn't that what just about every useful CPU instruction does? So perhaps "practical" is the most important word!? Which language is best for what? 2 schools of students: those that see useful things and apply them to make their work easier and/or more fun, and those that bag every language they will never bother to study cos they're stuck coding in x language for the rest of their life. I would like to think that I'm in the first school of students, having recently been learning tcl/tk and perl. I will still use a bash or shell command instead of or inside of a perl script if it fits faster (coding wise) problem resolution. Chris
Re: Reporting
by l2kashe (Deacon) on Apr 04, 2003 at 05:50 UTC
    The only speed up that I can think of right now is

    If I know I'm going to be sorting a large dataset later on, and I want that sorted loop to be a little tighter I will prepackage the elements I want to actually output.. something along the lines of
    # a sample file, lets say a *really* long /etc/passwd file # with lines like root:x:0:0:Super User:/:/sbin/sh someone:x:45:14:Some One:/home/someone:/bin/csh sometwo:x:46:14:Some Two:/home/sometwo:/bin/ksh # Lets say im gonna sort on uid then gid, I'll do while (<IN>) { chomp; ($uid,$gid) = ( split(/:/) )[2,3]; push(@tosort, [$uid, $gid, $_]); } ... for ( sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } @tosort ) { print "$_->[2]\n"; }
    Now I haven't actually benchmarked it, so I'm not sure about a speed increase, but in the actual flow of the program, provided the comments are there to describe whats going on, it makes it easier for me to follow later on, and the for (sort) {} block is just that little bit tighter without needing a split(/:/) in there...

    p.s after proof reading my post for typos I notice another idiom I use, again without any thought as to performance. I only return the values from a split I want/need. I also don't know if that is faster (its a little late to be benchmarking, but Ill probably get around to it tommorrow, uh.. today..) but its something I do consistantly. That way a few lines later I'm not pulling arbitrary values out of some array, that may or may not be labeled appropriately for that chunk of code..

    /* And the Creator, against his better judgement, wrote man.c */