One of my application was done half in perl and half in Java. The perl portion is used for parsing file (around 100M in size) and processing each line with bunch of regex. The Java portion is used to access Oracle database.

Today I ported the perl piece to Java. My experience with the porting: Java regex is just as easier as perl's, and I didn't see any obvious speed difference.

I am not comparing Java and perl, but merely saying that, if regex is a key point of using perl in a particular application, that's no longer a very convincing reason. Now there are many good choices, of course oncluding perl.

Replies are listed 'Best First'.
Re: regex in perl and Java
by jds17 (Pilgrim) on Jul 05, 2008 at 11:04 UTC
    I am programming both in Perl and Java a lot, my experience with regular expression usage in Java is that it is always verbose and clunky to set up. Of course the regexes are very similar, as the other monks already commented, but a regex always feels like a stranger in Java-Land.

    The speed I am mainly interested in (and where most times the real difference lies) is how fast and natural I can express what I want, and therefore for parsing and matching Perl will always be a lot better. With Perl, I often can write a one liner on the command line, and I am done.

    Of course, if you have a big project where regexes play only a minor role, it may feel different, but then the focus lies elsewhere.

      Fair thought.
Re: regex in perl and Java
by Your Mother (Archbishop) on Jul 05, 2008 at 03:30 UTC

    I'm not a Java guy so someone might need to correct me but I think this is not at all surprising since many modern regular expression libraries are based at least loosely on Perl's syntax (Java's java.util.regex and PHP's for example). Perl and Java both have pluggable regular expression engines now too so you can probably get whatever you want in either.

    What might be more interesting, around here anyway, is how easy it would be to port the DB portion of your code from Java to Perl, say with DBIx::Class or Rose::DB or even raw DBI. I suspect you'd end up with a speed gain and more maintainable, flexible, extensible code. Regular expressions are not likely to be the reason any given hacker chooses Perl or Java for a big project.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: regex in perl and Java
by dragonchild (Archbishop) on Jul 05, 2008 at 04:33 UTC
    Given that the pcre (Perl-Compatible Regular Expressions) library is the primary standard for regular expressions throughout the programming world, this isn't surprising. :-)

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

      I personally believe that this is fair enough, but one point that is important to be stressed, and may be of interest to the OP, is that even if (s)he may not need them say now or on a much frequent basis, Perl regexen in perl support extensions allowing (mostly) arbitrary code: I don't know about pcre, but I doubt that they're ported there to, allowing constructs which involve code in some other languages. So definitely this should be a point in favour of (strictly) Perl regexen.

      (Apologies for replying so late.)

      --
      If you can't understand the incipit, then please check the IPB Campaign.
        This is true enough in non-dynamic languages such as C. If I recall, Ruby and Python support the arbitrary code bit in their own fashion, so this isn't a point strictly in favour of Perl.

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: regex in perl and Java
by moritz (Cardinal) on Jul 06, 2008 at 17:42 UTC
    I am not comparing Java and perl, but merely saying that, if regex is a key point of using perl in a particular application, that's no longer a very convincing reason
    I disagree.

    Both Perl and Java are Turing complete, so you can use any feature in any of those languages, provided you invest enough time to get the infrastructure right.

    But normally you just don't want to set up your infrastructure, load modules/libraries (and look up their name first), you just want to write something like this:

    while (<>) { print if m/^\d/ && m/\d$/; }

    If you do that in Java, how many classes do you have to load first? How many classes and objects do you have to declare, and does all that extra line noise make your program more readable?

    It's not only important if you can do something, it's also important how you can do it.

      It's not only important if you can do something, it's also important how you can do it.

      Exactly, expressive power. The reason we use higher level languages instead of assembly -- and analogously why script languages like Perl seems to start eating up traditional languages like Java...

Re: regex in perl and Java
by perrin (Chancellor) on Jul 05, 2008 at 17:29 UTC
    This is true if your regex needs are fairly basic, but Perl's regex capabilities go far beyond Java's. If you look at some of the extended regex syntax for Perl, you'll find features that are simply not supported in other languages.
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: regex in perl and Java
by Anonymous Monk on Jul 05, 2008 at 03:32 UTC
      However talking about speed alone, the links you provided are way too old. Not sure whether any one compared java 6 and perl 5.10

      First of all, you misunderstood my OP, I wasn't comparing speed. For what I am doing, and for lots of other things, as long as there is a fair speed, either language is viable.

      My point is that, since Java supports regex with no performance drawback, perl cannot win regex argument any more - meaning perl loses one of its strength - or to be more precise, what was perl's strength along is now shared by many other languages including java.

      Don't send old stuffs from 3 or 4 years ago to me, will you? Both java and perl advanced.
Re: regex in perl and Java
by nikosv (Deacon) on Jul 06, 2008 at 11:35 UTC
    There are POSIX compatible regexes which are used by tools such as awk,sed etc and the rest are Perl compatible regexes.
    So Perl has set the standard and any other language has Perl regex compatibility.The other thing is that Perl evolves the standard with new additions (such as named captures ,posessive quantifiers etc) and optimizations and the others follow.
    So it is still a key point of using Perl and I would certainly choose Perl for its regex support.