greatshots has asked for the wisdom of the Perl Monks concerning the following question:

monks

create table "metrica".vodafone_capacity_cell_count_week ( nc_id integer, cell_id varchar(50) not null , week_of integer not null , busy_hour "informix".utime, gsm_tch_cong_pc_range_1 integer, gsm_tch_cong_pc_range_2 integer, gsm_tch_cong_pc_range_3 integer, gsm_sdcch_cong_pc_range_1 integer, gsm_sdcch_cong_pc_range_2 integer, gsm_sdcch_cong_pc_range_3 integer, gsm_sdcch_cong_pc_range_4 integer, gsm_hr_tch_util_pc_range_1 integer, gsm_hr_tch_util_pc_range_2 integer, gsm_hr_tch_util_pc_range_3 integer, gsm_hr_tch_util_pc_range_4 integer, gsm_hr_tch_util_pc_range_5 integer, gprs_cong_pc_range_1 integer, gprs_cong_pc_range_2 integer, gprs_cong_pc_range_3 integer, config_tch integer, config_fpdch integer, config_trx integer, config_layer integer, cell_type varchar(25) default null );
From the above input I would like to take table_name.column_name format output. but default null is not the column name. keeping that in mind I would like to get a regex to do this. I have written a perl code which comes for 20 lines. to do the above task. But I would like to get the better solution.
The out put should be as follows
vodafone_capacity_cell_count_week vodafone_capacity_cell_count_week.nc_id vodafone_capacity_cell_count_week.cell_id vodafone_capacity_cell_count_week.week_of vodafone_capacity_cell_count_week.busy_hour vodafone_capacity_cell_count_week.gsm_tch_cong_pc_range_1 vodafone_capacity_cell_count_week.gsm_tch_cong_pc_range_2 vodafone_capacity_cell_count_week.gsm_tch_cong_pc_range_3 vodafone_capacity_cell_count_week.gsm_sdcch_cong_pc_range_1 vodafone_capacity_cell_count_week.gsm_sdcch_cong_pc_range_2 vodafone_capacity_cell_count_week.gsm_sdcch_cong_pc_range_3 vodafone_capacity_cell_count_week.gsm_sdcch_cong_pc_range_4 vodafone_capacity_cell_count_week.gsm_hr_tch_util_pc_range_1 vodafone_capacity_cell_count_week.gsm_hr_tch_util_pc_range_2 vodafone_capacity_cell_count_week.gsm_hr_tch_util_pc_range_3 vodafone_capacity_cell_count_week.gsm_hr_tch_util_pc_range_4 vodafone_capacity_cell_count_week.gsm_hr_tch_util_pc_range_5 vodafone_capacity_cell_count_week.gprs_cong_pc_range_1 vodafone_capacity_cell_count_week.gprs_cong_pc_range_2 vodafone_capacity_cell_count_week.gprs_cong_pc_range_3 vodafone_capacity_cell_count_week.config_tch vodafone_capacity_cell_count_week.config_fpdch vodafone_capacity_cell_count_week.config_trx vodafone_capacity_cell_count_week.config_layer vodafone_capacity_cell_count_week.cell_type
Above output is from my script.

Replies are listed 'Best First'.
Re: removing unwanted lines from the pattern match
by davorg (Chancellor) on Nov 16, 2006 at 10:32 UTC

    The formatting of that last "default" line makes it harder than it should be. This solution relies on the amount of whitespace at the start of column definitions.

    #!/usr/bin/perl use strict; use warnings; my $tab; while (<DATA>) { if (/create table.+\.(\w+)/) { $tab = $1; } if (/^ (\w+)/) { print "$tab.$1\n"; } } __DATA__ create table "metrica".vodafone_capacity_cell_count_week ( nc_id integer, cell_id varchar(50) not null , week_of integer not null , busy_hour "informix".utime, gsm_tch_cong_pc_range_1 integer, gsm_tch_cong_pc_range_2 integer, gsm_tch_cong_pc_range_3 integer, gsm_sdcch_cong_pc_range_1 integer, gsm_sdcch_cong_pc_range_2 integer, gsm_sdcch_cong_pc_range_3 integer, gsm_sdcch_cong_pc_range_4 integer, gsm_hr_tch_util_pc_range_1 integer, gsm_hr_tch_util_pc_range_2 integer, gsm_hr_tch_util_pc_range_3 integer, gsm_hr_tch_util_pc_range_4 integer, gsm_hr_tch_util_pc_range_5 integer, gprs_cong_pc_range_1 integer, gprs_cong_pc_range_2 integer, gprs_cong_pc_range_3 integer, config_tch integer, config_fpdch integer, config_trx integer, config_layer integer, cell_type varchar(25) default null );
    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: removing unwanted lines from the pattern match
by Corion (Patriarch) on Nov 16, 2006 at 10:36 UTC

    Have you looked at SQL::Statement? It can parse many types of SQL statements and then return information about these statements. You sound as if you don't have any control over the format that your CREATE statement is sent to you, so parsing with regular expressions without prior knowledge of the formatting will be guesswork. If SQL::Statement works for you, use it instead.

Re: removing unwanted lines from the pattern match
by BrowserUk (Patriarch) on Nov 16, 2006 at 11:39 UTC

    This seems to produce the desired output regardless of how the input is formatted.

    #! perl -slw use strict; $/ = ';'; while( <DATA> ) { m[ create\s+table\s+ (?:"[^"]+"\.)* #" ( \S+ ) \s* \( \s* ( .+ ) \s* \) ]smx or warn "*** No match '$_'" and next; print "$1.$_" for map{ m[(\S+)] and $1 } split ',\s*', $2; print "\n------\n"; } __DATA__ create table "metrica".vodafone_capacity_cell_count_week ( nc_id integer, cell_id varchar(50) not null , week_of integer not nu +ll , busy_hour "informix".utime, gsm_tch_cong_pc_range_1 integer, gsm_tch_cong_pc_range_2 integer, gsm_tch_cong_pc_range_3 integer, +gsm_sdcch_cong_pc_range_1 integer, gsm_sdcch_cong_pc_range_2 integer, gsm_sdcch_cong_pc_range_3 integer, gsm_sdcch_cong_pc_range_4 integer, gsm_hr_tch_util_pc_range_1 integer, gsm_hr_tch_util_pc_range_2 integer, gsm_hr_tch_util_pc_range_3 int +eger, gsm_hr_tch_util_pc_range_4 integer, gsm_hr_tch_util_pc_range_5 integer, gprs_cong_pc_range_1 integer, gprs_cong_pc_range_2 integer, gprs_c +ong_pc_range_3 integer, config_tch integer, config_fpdch integer, config_trx integer, config_layer integer, cell_type varchar(25) default null );

    Produces

    vodafone_capacity_cell_count_week.nc_id vodafone_capacity_cell_count_week.cell_id vodafone_capacity_cell_count_week.week_of vodafone_capacity_cell_count_week.busy_hour vodafone_capacity_cell_count_week.gsm_tch_cong_pc_range_1 vodafone_capacity_cell_count_week.gsm_tch_cong_pc_range_2 vodafone_capacity_cell_count_week.gsm_tch_cong_pc_range_3 vodafone_capacity_cell_count_week.gsm_sdcch_cong_pc_range_1 vodafone_capacity_cell_count_week.gsm_sdcch_cong_pc_range_2 vodafone_capacity_cell_count_week.gsm_sdcch_cong_pc_range_3 vodafone_capacity_cell_count_week.gsm_sdcch_cong_pc_range_4 vodafone_capacity_cell_count_week.gsm_hr_tch_util_pc_range_1 vodafone_capacity_cell_count_week.gsm_hr_tch_util_pc_range_2 vodafone_capacity_cell_count_week.gsm_hr_tch_util_pc_range_3 vodafone_capacity_cell_count_week.gsm_hr_tch_util_pc_range_4 vodafone_capacity_cell_count_week.gsm_hr_tch_util_pc_range_5 vodafone_capacity_cell_count_week.gprs_cong_pc_range_1 vodafone_capacity_cell_count_week.gprs_cong_pc_range_2 vodafone_capacity_cell_count_week.gprs_cong_pc_range_3 vodafone_capacity_cell_count_week.config_tch vodafone_capacity_cell_count_week.config_fpdch vodafone_capacity_cell_count_week.config_trx vodafone_capacity_cell_count_week.config_layer vodafone_capacity_cell_count_week.cell_type ------ *** No match ' ' at c:\test\junk3.pl line 6, <DATA> chunk 3.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: removing unwanted lines from the pattern match
by jhourcle (Prior) on Nov 16, 2006 at 13:51 UTC

    Why use a regex? It'd be better to use a real parser, and I'm probbably going to fail on selecting the fields, but once you get that, just split the list on commas, and choose the first word from each:

    local $/ = undef; my $create_sql = <DATA>; my ($tablename, $fields) = ($create_sql =~ m/create table ".+?"[.](\w+ +)\s+\((.*?)\);/s); print "$tablename\n"; foreach my $fieldinfo (split /,/, $fields) { my ($fieldname) = ( $fieldinfo =~ m/^\s*(\w+)/ ); print "$tablename.$fieldname\n"; } __DATA__ create table "metrica".vodafone_capacity_cell_count_week ( nc_id integer, cell_id varchar(50) not null , week_of integer not null , busy_hour "informix".utime, gsm_tch_cong_pc_range_1 integer, gsm_tch_cong_pc_range_2 integer, gsm_tch_cong_pc_range_3 integer, gsm_sdcch_cong_pc_range_1 integer, gsm_sdcch_cong_pc_range_2 integer, gsm_sdcch_cong_pc_range_3 integer, gsm_sdcch_cong_pc_range_4 integer, gsm_hr_tch_util_pc_range_1 integer, gsm_hr_tch_util_pc_range_2 integer, gsm_hr_tch_util_pc_range_3 integer, gsm_hr_tch_util_pc_range_4 integer, gsm_hr_tch_util_pc_range_5 integer, gprs_cong_pc_range_1 integer, gprs_cong_pc_range_2 integer, gprs_cong_pc_range_3 integer, config_tch integer, config_fpdch integer, config_trx integer, config_layer integer, cell_type varchar(25) default null );