in reply to Samples of big projects done in Perl

It really depends on what you mean by BIG. Based on GrandFather's definition, I'd have to say that the code we have at work is medium-large. 33,000 lines (using a simple "find . -name '*.p[ml]' -print0 | xargs -0 cat | wc -l" - so really much less as code), and taken about 8 person-years to get here (approximately 5 years, with 1 person full-time on it, and myself part-time on it). Two of three requirements met, and the third requirement nearly met. (If they hadn't done such a bang-up job at changing requirements on us, it would have been a bit less time to get here.)

That said, by the time it is doing everything we want, it'll be at least 37,000-40,000 lines, and take another year or so.

Is that "big"?

Replies are listed 'Best First'.
Re^2: Samples of big projects done in Perl
by GrandFather (Saint) on Aug 07, 2006 at 03:47 UTC

    Thinking about it a bit more - BIG is probably a function of language. 20,000 lines is getting BIG for Perl, but is smallish for C++. Our main code base is about 1,000,000 lines of C++, written by 3 - 6 people over about 10 years. I'd consider that to be BIG, but it is still manageable and maintainable. I suspect a Perl project of that size would be getting to be something of a handfull!

    Bigness could also be measured in terms of functionality provided. A Perl equivelent of the 1e6 line C++ program (presuming it were practical) may be half the number of lines of code, but I doubt it would be anywhere near as managable as the C++. Different rules apply as code gets larger so scaling a size/functionality ratio for Perl compared with other languages is unlikely to be meaningfull as projects get BIG.


    DWIM is Perl's answer to Gödel

      My rough estimate (informed by Peter Scott and more or less confirmed by Nicholas Clark) is that there's about an order of magnitude difference between C and Perl SLOC. C++ can be a little better, but I doubt it's as good as two to one. I expect a million line C++ project to take perhaps 200-250k lines of Perl. It depends on the domain though.

        Two to one was pretty much a guess, but based on the sort of C++ code I'm involved in which tends to be interacting with an instrument over USB and data analysis. I'd guess in those areas Perl wouldn't have so much advantage.

        On the other hand I have a modest size C++ app (maybe 2000 lines) that I'm slowly rewriting in Perl. In that case I expect to get about a five to one advantage, but it is processing email and I can use MIME::Lite to replace a swag of C++. That particular application is made for Perl and I regret not knowing Perl when I chose C++ to write the first version!


        DWIM is Perl's answer to Gödel
      I always wondered: Why would a 500,000 lines project in Perl be less easier to maintain than a project with 1,000,000 lines of C++?

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

        Named types and strict compile time type checking are a large part of it. Member access protection is a modest part.


        DWIM is Perl's answer to Gödel
Re^2: Samples of big projects done in Perl
by ForgotPasswordAgain (Vicar) on Aug 07, 2006 at 14:11 UTC

    I don't think that's "big" big. You're probably not running a space shuttle or emacs or anything. It's beyond shell script size, though.

    I'm aware of the "order of magnitude" argument mentioned elsewhere in this thread, saying that 1 line of Perl is like 10 lines of C. It sounds nice, but I'm not sure I really buy it. Since 1 line of Perl is 10 lines of C, assuming that's true in the first place, then in the Perl code you're freed to think at a higher level of abstraction. It's not as if you're juggling 10 times as much in each line of Perl code.

    I think that argument is more for motivating people to write cleaner Perl code in the first place. "Should I put these 100 lines in a module?" "Would you put 1000 lines of C in a .c file?" "Sure." "Put your 100 Perl lines in a module, then." It seems to me to be more a way of determining at what size you want to start thinking about things at a higher level. But.. I mean if you're using a C library or a Perl module, it doesn't really matter how much code is inside them. They have (ideally) some external interface that you use, and that's it. Using 10 C libraries or 10 Perl modules, neither is substantially more complicated than the other, is it?

    How many lines of code is Firefox, for example? How many lines would the equivalent implemented in Perl be? Would the Perl version be 10 times simpler? What if you included also the Perl modules it uses? Perl itself?

    Using your shell command I found 190,000 lines in the latest release of Bricolage. But a lot of that is POD. Has anyone written a line counter with POD sections stripped out? (Maybe comments should also be removed; I'm not sure what exactly the definition of "line of code" is. On the other hand, Bricolage is a web application, so we should probably also include JavaScript, CSS, HTML, etc. as code.)

      Has anyone written a line counter with POD sections stripped out?

      Here's the somewhat rough-and-ready code I use to count lines in my work application:

      #!/usr/bin/perl -w use strict; use File::Find; my($d, $c, $h, $q, $s, $x) = (0, 0, 0, 0, 0, 0); for my $dir (qw/ cgi lib util /) { find(sub { return if /^\.#/ || /,v$/ || /\.(swp|gif|jpg|png|ps|tr|o|a)$/ || / +~$/ || /^core$/; $File::Find::prune = 1, return if $File::Find::dir =~ /\bCVS\b/; return unless -f && -T; local *F; my $file = $_; open F, $file or warn "$File::Find::dir/$file: $!\n"; my $cut = 0; my $here = undef; my $where; while (<F>) { if ($cut) { ++$$where; $cut = 0 if $cut > 0 && /^=cut/; } elsif ($here) { ++$$where; $here = undef if /$here/; } elsif (/^=/) { ++$d; $cut = 1; $where = \$d; } elsif (!/\S/) { ++$s; } elsif (/^\s*#/) { ++$c; } elsif (/<<(?:'(\w+)'|(\w+))/) { my $style = $1 || $2; $where = ($style eq 'SQL') ? \$q : \$h; $here = qr/^$style$/; ++$x; } elsif (/^__(DATA|END)__$/) { $cut = -1; $where = \$h; } else { ++$x; } } }, $dir); } print "doc $d, comment $c, SQL $q, text $h, space $s, code $x\n";

      This currently reports: doc 5699, comment 1685, SQL 675, text 4295, space 4382, code 37030.

      Hugo