Hi, I'm trying to solve a problem with some code I've inherited. The code uses Regex::Assemble to create a compiled regex of some 400 words. The problem is that I need to be able to split the input record based on the matches found by the regex. Due to various issues (which I won't go into) I cannot change the way that the regex is being built via Regex::Assemble
Here is a sample of the input records
insert newtab values(1) drop table newtab create table XXXX (field1 int null) insert XXXX values(1) grant select + on XXXX to sa drop table XXXX create table XXXX (field1 int null) insert XXXX values(1) grant select + on XXXX to sa drop table XXXX rollback tran create table XXXX (field1 int null) sp_help XXXX insert XXXX values(1) grant select on XXXX to sa drop table XXXX rollb +ack tran lock table cDsnJbgnd..smfJwDlwb in share mode
For this example the regex is built from this file
create table ([a-z]|[_])+ insert \b([a-z]|[_])+\b lock table grant
This is what the regex produces
(?-xism:(?ig:(?:create table ([a-z]|[_])+|insert \b([a-z]|[_])+\b |loc +k table|grant)))
Each record ($rec) is read in from a database via DBI::DBD and interrogated like this
# Build regex list open KEYS,"patterns" or die "Can't open the pattern file: $!"; my $exp = Regexp::Assemble ->new(flags => 'ig',chomp => 1) ->add( <KEY +S> ); close KEYS; $exp->track( 1 ); while (get the database records) { if ($exp->match($rec) ) { # populate a csv file for a spreadsheet } }
The problem is that some of the input records are multi-statement commands (i.e. they contain more than one matched pattern) and I need to be able to split the record up into it's constituent commands (where the split would be on a matched regex pattern) and process each bit seperately before printing to the csv file.
e.g. this record
create table XXXX (field1 int null) insert XXXX values(1)
Needs to be processed as
create table XXXX (field1 int null) insert XXXX values(1)
Can somebody explain to me how to do this assuming I have to use Regex::Assemble to do the pattern match (and the sql to extract the records cannot be changed either !) ?
Ta

In reply to Splitting a string based on a regex by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.