I had this thought for a while, and finally got a bit free time to implement it.

I think everybody agrees that one of Perl's key strength is its regexp, which is something most of the other languages lack of. I had this idea to run a server written in Perl in our company. The server accepts TCP request to do m// or s///, and return the results to the clients. The clients can be written in any language, as long as that language support something like socket. Now virtually that language gains regexp power through Perl.

I would like to support two ways of making request: usual TCP calls and SOAP.

And I would like to hear your input at this early stage. Some of the proto type I have now:

The server in Perl: (modifiers will be considered later)

use IO::Socket::INET; use strict; use warnings; my $s = IO::Socket::INET->new(Proto => "tcp", LocalHost => "localhost", LocalPort => 3000, Timeout => 60, Listen => 10, Reuse => 1) || die "failed to start\n"; print "regexp service started\n"; while (1) { if (my $c = $s->accept()) { print "Got one service request\n"; my $func = readLine($c); if ($func eq "match") { my $string = readLine($c); my $pattern = readLine($c); if ($string =~ m/$pattern/) { print $c "1\n"; } else { print $c "0\n"; } } elsif ($func eq "substitute") { my $string = readLine($c); my $pattern = readLine($c); my $replacement = readLine($c); $string =~ s/$pattern/$replacement/; print $c "$string\n"; } close($c); } } sub readLine { my $c = shift; my $line = <$c>; $line =~ /(.*)\r\n/; return $1; }

The Java API (the wrapper):

import java.net.*; import java.io.*; class RegExp { public static boolean match(String string, String pattern) { try { Socket s = new Socket("localhost", 3000); BufferedReader in = new BufferedReader(new InputStream +Reader(s.getInputStream())); PrintWriter out = new PrintWriter(new BufferedOutputStream +(s.getOutputStream()), true); out.println(new String("match")); out.println(string); out.println(pattern); String data = in.readLine(); if (data.equals("1")) { return true; } else { return false; } } catch (IOException e) { System.out.print(e); return false; } } public static String substitute(String string, String pattern, Str +ing replacement) { try { Socket s = new Socket("localhost", 3000); BufferedReader in = new BufferedReader(new InputStream +Reader(s.getInputStream())); PrintWriter out = new PrintWriter(new BufferedOutputStream +(s.getOutputStream()), true); out.println(new String("substitute")); out.println(string); out.println(pattern); out.println(replacement); String data = in.readLine(); return data; } catch (IOException e) { System.out.print(e); return null; } } }

The Java Tester:

class RegExpTest { public static void main(String[] argv) { if (argv[0].equals("m")) { boolean result = RegExp.match(argv[1], argv[2]); System.out.print(result); } else if (argv[0].equals("s")) { String result = RegExp.substitute(argv[1], argv[2], argv[3 +]); System.out.print(result); } } }

You test like this: java RegExpTest "s" "123abc456" "\d" "-"

Replies are listed 'Best First'.
Re: A regexp server in Perl
by itub (Priest) on Nov 10, 2004 at 23:06 UTC
    Looks like an interesting idea, but I don't think it will be very good for performance. And I would say that most recent languages now come with regular expressions (they may have learned from Perl!); you can even use them in C if you link to the proper library. I don't know why you need a regular expression server; isn't it possible to use a language or a library that supports regular expressions?
      "And I would say that most recent languages now come with regular expressions (they may have learned from Perl!); "

      Yes, they learned, and still learning. I cannot say for other languages, but with Perl in my mind, when I looked at Java's, I was far from satisfied.

      This came to my mind, as there are things I would not like to implement in Perl due to various reasons, but still want regexp that is as powerful as Perl's...

      You are right about the performance... Obviously it would be faster to use native regexp ability of the language than to make socket call, unless the regexp in that language has a very poor performance. But if the regexp in other languages cannot deliver what you want, then performance becomes second, and you first want to be able to do it.

      Also I expect the call to the server is limited. Put in this way, if a program heavily uses regexp, then I would rather do it in Perl, not something like Java or .Net, thus no call is needed. For those applications left for Java or .Net, regexp is usually not the sole part of it, and you expect only limited calls, the performance impact should be low.

Re: A regexp server in Perl
by diotalevi (Canon) on Nov 10, 2004 at 23:21 UTC
    When I match in other languages, I care about related results like $1 ... $n, $`, $&, $'. I also care about things like case sensitivity and whether the /g flag plays a role. I'd suggest you look at the existing Java or .Net Regexp objects and model your API on theirs. Your existing API exposes only one style of regexp usage and I'm sure your other users are going to care about more than just this one way.
      "I'd suggest you look at the existing Java or .Net Regexp objects and model your API on theirs."

      Good point. I know Java supports RegExp now, but it is pretty weak comparing to Perl's. That's where my idea came in. However, it would be a good idea to make the API's close to Java's native ones.

      It is even better to carefully compare Java's regexp and Perl's, and see in details what Perl can do to strengthen Java's. Maybe only making calls to the server when Java's native RegExp cannot handle the situation (this would in a way take care of itub's performance concern)...

        Apart from java native regular expressions, there are Jakarta ORO and Jakarta Regexp.

        ORO is interesting because it implements perl syntax, with some success. I have used it a lot before java had its own regular expressions.

        But I like the server idea. It could also be used with perl clients.

        update How about a pm regexserver nodelet? :)

Re: A regexp server in Perl
by Jenda (Abbot) on Nov 11, 2004 at 10:07 UTC

    While it might very well be a good idea to make the Perl regexps available to other languages via your own API I don't think the implementation is reasonable. Not only that it's going to be slow, it's going to be dangerous as well. Keep in mind that the Perl regexps may contain embedded Perl code. As may unless you are very carefull the replacement string. So this would allow anyone with access to the server to run any Perl code on the server with the permissions of the server process. Not too good I'd say.

    You should find some other way to connect the Perl with the other languages. I did not like the Regexp object provided with VBScript so I wrote a COM wrapper of the Perl regexps providing an API I like, you may try to do something similar for Java. Or use mine if you happen to need this only under Windows.

    Jenda
    We'd like to help you learn to help yourself
    Look around you, all you see are sympathetic eyes
    Stroll around the grounds until you feel at home
       -- P. Simon in Mrs. Robinson

      Even if the regex are sanitized to removed Perl code, they are still dangerous. It is fairly easy to produce pathological regular expressions that won't finish in the age of the universe and take all the CPU to process.

      This results in a denial of service attack. The server would need to have some way to kill off matches if they run for too long.

Re: A regexp server in Perl
by Anonymous Monk on Nov 11, 2004 at 11:26 UTC
    Other good thing of Perl is that it can be embeded in anything. Take a look at perlemb and other CPAN modules that have codes for that, like PAR, LibZp, PLJava, PLDelphi. So you will be able to call Perl REGEXP directly from your app.

    Also you can take a look at http://www.pcre.org/, where we have "PCRE - Perl Compatible Regular Expressions", that is C libray with only Perl REGEXP to be embeded in other applications (this is what PHP and Java uses).

    By gmpassos.

Re: A regexp server in Perl
by stvn (Monsignor) on Nov 11, 2004 at 15:32 UTC

    Seems kind of excessive, but an interesting start. I have a few suggestions though.

    Don't make the RegExp methods static class methods, then you can manage a server connection per-instance, which might help performance issues. Chances are if you are going to use this, you are going to use it for larger reg-exp problems and not just simple ones, so it seems to me there is a good chance you will run multiple reg-exps to the server.

    Even though it could be dangerous, I would allow a full reg-ex to be sent to the server. Like this:

    boolean result = myRegExpInstance.match('/(.*?)somethings/i');
    Then you don't need to worry about all the possible modifiers and how to handle them in your protocol.

    I would also suggest then parsing the reg-exp a little, you could could the number of () you have and then make those matches available. You should also test for any embedded perl code, and disallow it. Obviously this can get really messy, but I think it's complexity can easily be managed over time (only implement a little at a time, as the need arises maybe).

    Cool idea :)

    -stvn