onelesd has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I am faced with a common task of parsing log files that are in no particular format at all. The only thing I can count on is that each entry in the log will be on it's own line. In the past I have simply created as many regular expressions as was necessary to parse 100% (or close to it) of the log file.

Before doing the same thing this time, and manually creating all of those expressions, I am curious - have any of you out there experimented with automatically generating a regular expression based on several lines of source data? Did it end up working for you or not?

I have looked around a little bit, and came across this website, which appears to tackle part of this experiment, but it derives it's expression from one line of input, and not many: http://www.txt2re.com/.

I appreciate any advice, even if it's to drop the idea altogether because it Just Won't Work (tm).

  • Comment on Automatically generate regular expression from source data

Replies are listed 'Best First'.
Re: Automatically generate regular expression from source data
by planetscape (Chancellor) on Aug 01, 2011 at 23:25 UTC

    I can't endorse this approach as a Good Idea™, but you may want to take a look at Regexp::Assemble if you insist on a regex-based solution.

    HTH,

    planetscape
Re: Automatically generate regular expression from source data
by james2vegas (Chaplain) on Aug 01, 2011 at 23:16 UTC
Re: Automatically generate regular expression from source data
by JavaFan (Canon) on Aug 02, 2011 at 05:48 UTC
    If they have no particular format at all, what is you want to parse? If it's just "oh, match everything", I can offer you /.*/s, but I'm not sure that's very helpful.

    Regular expressions are used for two things: validating, or extracting. For the former, you need to know what you want to match, and what you need to reject, for the second, you need to know what you want to extract.

    You haven't given us any useful information.

Re: Automatically generate regular expression from source data
by onelesd (Pilgrim) on Aug 02, 2011 at 20:55 UTC

    Thank you for the module suggestions - they offer some approaches I hadn't considered.

    I got this idea from dbicdump, which generates DBIx::Class from a database schema. It saved me a lot of time and thought maybe there was something in the same vein for regex - something that can produce boilerplate that you then tweak.

    At this point it's just an idea and not something I am going to actually try to do in production code. Thanks again for your advice!