Re^5: regex causing segmentation fault (core dump)

Well, that certainly is a lot of detail -- most of which doesn't really shed light on the basic problem. We do at least get to see that you really are pulling a regex out of a database, and applying it to the full content of a data file. I presume the regex at the top of the thread indicates what comes from the database, but you haven't shown us what is in the file, or how you built the regex that went into the database. That might matter.

Splitting it into lines is a major design change

I assume this is because of how the regexes are being loaded into the database. So what is the point of trying to store and use monster regexes this way? Is that really necessary? There is so much redundant stuff in that big regex, if you really need the match to extend over the entire data file (which I doubt), it would make more sense to construct most of the regex on the fly in the perl script, rather than storing it all in a table (numerous times, presumably, with minor, systematic variations). Maybe a different design would actually be better.

As for simplifying the code, you still have farther to go on that. Preparing the DBI statement handles in advance and using placeholders would be a good start. Consider an idiom like this, especially for the queries that you are using over and over again:

my @map_cols = qw( a.mapping a.path_prefix a.parent_mapping
                   a.parent_element a.rule a.data_nature
                   a.table_suffix b.element b.column_name
                   b.priority_local b.priority_global );

my $map_sql = 'SELECT ' . join( ',', @map_cols ) .
  'FROM cor_ekl_map a, cor_ekl_map_dfn b
  WHERE a.mapping = ? and a.mapping = b.mapping';

my $map_sth = $dbh->prepare( $map_sql );

my @proc_cols = qw(a.process_id a.process_ts a.process_stage
                   b.ekl_set b.mapping);

my $proc_sql = 'SELECT ' . join( ',', @proc_cols ) .
  'FROM sys_ekl_ipt_001 a, cor_ekl_set_dfn b
  WHERE a.ekl_set = b.ekl_set';

my $proc_sth = $dbh->prepare( $proc_sql );

s/^[ab]\.// for ( @map_cols, @proc_cols );  # don't need table prefixe
+s now

my $regex_sth = $dbh->prepare( 'SELECT regex FROM cor_ekl_rul WHERE ru
+le = ?' );
my $cdata_sth = $dbh->prepare( 'SELECT column_name column_type
  FROM cor_dat_col WHERE data_nature = ? AND table_suffix = ?' );

$proc_sth->execute;

while ( my $proc_row = $proc_sth->fetchrow_arrayref ) 
{
    my %procdata;
    @procdata{@proc_cols} = @$proc_row;

    $map_sth->execute( $procdata{"mapping"} );

    while ( my $map_row = $map_sth->fetchrow_arrayref ) 
    {
        my %mapdata;
        @mapdata{@map_cols} = @$map_row;

        my $table_name = join( '_', 'dat',
                               $mapdata{data_nature},
                               $mapdata{table_suffix} );

        $regex_sth->execute( $mapdata{rule} );
        my ( $regex ) = $regex_sth->fetchrow_array;

        # and so on...
    }
}
[download]

I think that going this direction will make your code a lot shorter, simpler, and easier to maintain. Boiling down the regexes to just the stuff that matters will help too. I suspect that you don't really need to store regexes in the database as all.

Comment on Re^5: regex causing segmentation fault (core dump) Download Code