regular expression to get the tablename.column

greatshots has asked for the wisdom of the Perl Monks concerning the following question:

monks,

TABLE;nokia_sgsn_tot_int_util_month
COLUMN;nc_id;integer
COLUMN;sgsn_id;varchar(50) not null
COLUMN;month_of;integer not null
COLUMN;nokia_sgsn_interface_utilisation_busy_hour;utime
COLUMN;data_coverage_pc;float
COLUMN;tot_measurement_seconds;integer
COLUMN;avg_measurement_seconds;integer
COLUMN;tot_ifinbroadcastpkts;int8
COLUMN;avg_ifinbroadcastpkts;int8
COLUMN;min_ifinbroadcastpkts;int8
COLUMN;max_ifinbroadcastpkts;int8
COLUMN;nsiubh_ifinbroadcastpkts;int8
COLUMN;tot_ifindiscards;int8
COLUMN;avg_ifindiscards;int8
COLUMN;min_ifindiscards;int8
COLUMN;max_ifindiscards;int8
COLUMN;nsiubh_ifindiscards;int8
COLUMN;tot_ifinerrors;int8
COLUMN;avg_ifinerrors;int8
COLUMN;min_ifinerrors;int8
COLUMN;max_ifinerrors;int8
COLUMN;nsiubh_ifinerrors;int8
COLUMN;tot_ifinmulticastpkts;int8
COLUMN;avg_ifinmulticastpkts;int8
COLUMN;min_ifinmulticastpkts;int8
COLUMN;max_ifinmulticastpkts;int8
COLUMN;nsiubh_ifinmulticastpkts;int8
COLUMN;tot_ifinnucastpkts;int8
COLUMN;avg_ifinnucastpkts;int8
[download]

In a directory I have 450 files. each file contains the table definitions as specified above. from these file I want get the output as specfied.
Using the above input

nokia_sgsn_tot_int_util_month.nc_id
nokia_sgsn_tot_int_util_month.sgsn_id
.
.
.
TABLE.COLUMN
[download]

UpdateCode added

#!/usr/bin/perl


while ( <STDIN> ) {
if ( $_ =~ /TABLE;(.*$)/ ) {
  $table_name = $1;
}
if ( $_ =~ /COLUMN;(.*);/ ) {
 print $table_name.'.'.$1."\n";
}
}
[download]

for aboove script I am passing a single file at a time. but I have to do this for 450 files. is that possible to do above task in a single line for 450 files ?
UPDATE I have 10 lines of perl script to do that. but, interested to do it in sigle line.

Comment on regular expression to get the tablename.column_name Select or Download Code

Replies are listed 'Best First'.
Re: regular expression to get the tablename.column_name by greatshots (Pilgrim) on Oct 18, 2006 at 04:12 UTC
`time awk -F ';' '/^TABLE/ {table=$2} /^COLUMN/ {printf("%s.%s\n",tabl +e,$2) }' * real 0m0.35s user 0m0.29s sys 0m0.05s time perl -ne '/^TABLE;([^;\n]+)/ && ($table = $1) or /^COLUMN;([^;\n] ++)/ && print "$table.$1\n"' * real 0m0.24s user 0m0.21s sys 0m0.03s time perl -nle '/TABLE;(.)/?($t=$1):/COLUMN;([^;])/&&print(qq{$t.$1} +)' * real 0m4.88s user 0m0.34s sys 0m0.05s` [download]	[reply] [d/l]
Re^2: regular expression to get the tablename.column_name by imp (Priest) on Oct 18, 2006 at 05:16 UTC
To get a better idea of how each strategy performs you need to observe the results over many invocations. Here is a benchmark of a few strategies: use Benchmark qw(cmpthese); use strict; my @data = <DATA>; my $data = join "\n",@data; #open OUT, '>&STDOUT'; open OUT, '>', '/dev/null'; cmpthese(1000, { table_col => sub {open my $fh, '<', \$data; table_col($fh)}, col_table => sub {open my $fh, '<', \$data; col_table($fh)}, multi => sub {open my $fh, '<', \$data; multi($fh)}, regex => sub {open my $fh, '<', \$data; regex($fh)}, }); sub multi { my $fh = shift; my $table; while(<$fh>) { if (/^COLUMN;(.+?);/) { print OUT $table,$1,"\n"; } elsif (/^TABLE;(.+)$/) { $table = $1 . '.'; } } } sub col_table { my $fh = shift; my $table; while(<$fh>) { /^COLUMN;(.+?);/ && print OUT ($table,$1,"\n") or /^TABLE;(.+) +$/ && ($table = $1 . '.') } } sub table_col { my $fh = shift; my $table; while(<$fh>) { /^TABLE;(.+)$/ && ($table = $1 . '.') or /^COLUMN;(.+?);/ && p +rint OUT ($table,$1,"\n") } } sub regex { my $fh = shift; local $/; my $data = <$fh>; my $table; $data =~ s{ ^TABLE;(.+?)\s\n \| ^COLUMN;(.+?);.\n }{$1 ? (($table=$1),"") : "$table.$2"}mexg; print OUT $data; } __DATA__ TABLE;nokia_sgsn_tot_int_util_month COLUMN;nc_id;integer COLUMN;sgsn_id;varchar(50) not null COLUMN;month_of;integer not null COLUMN;nokia_sgsn_interface_utilisation_busy_hour;utime COLUMN;data_coverage_pc;float COLUMN;tot_measurement_seconds;integer COLUMN;avg_measurement_seconds;integer COLUMN;tot_ifinbroadcastpkts;int8 COLUMN;avg_ifinbroadcastpkts;int8 COLUMN;min_ifinbroadcastpkts;int8 COLUMN;max_ifinbroadcastpkts;int8 COLUMN;nsiubh_ifinbroadcastpkts;int8 COLUMN;tot_ifindiscards;int8 COLUMN;avg_ifindiscards;int8 COLUMN;min_ifindiscards;int8 COLUMN;max_ifindiscards;int8 COLUMN;nsiubh_ifindiscards;int8 COLUMN;tot_ifinerrors;int8 COLUMN;avg_ifinerrors;int8 COLUMN;min_ifinerrors;int8 COLUMN;max_ifinerrors;int8 COLUMN;nsiubh_ifinerrors;int8 COLUMN;tot_ifinmulticastpkts;int8 COLUMN;avg_ifinmulticastpkts;int8 COLUMN;min_ifinmulticastpkts;int8 COLUMN;max_ifinmulticastpkts;int8 COLUMN;nsiubh_ifinmulticastpkts;int8 COLUMN;tot_ifinnucastpkts;int8 COLUMN;avg_ifinnucastpkts;int8 [download] Results: `Rate col_table multi table_col regex col_table 565/s -- -10% -14% -36% multi 625/s 11% -- -5% -29% table_col 658/s 16% 5% -- -26% regex 885/s 57% 42% 35% --` [download] With a different set of test data you will of course get different results. I was expecting the col_table and multi strategies to do better than table_col, but they consistently perform worse for me.	[reply] [d/l] [select]
Re^2: regular expression to get the tablename.column_name by ikegami (Patriarch) on Oct 18, 2006 at 04:16 UTC
Do you want fast or short? Those are conflicting goals. Adding `^` makes a big diff.	[reply] [d/l]
Re^3: regular expression to get the tablename.column_name by greatshots (Pilgrim) on Oct 18, 2006 at 04:31 UTC
Obviously.. both	[reply]
Re: regular expression to get the tablename.column_name - ugly and golfed by imp (Priest) on Oct 18, 2006 at 04:19 UTC
I offer the following monstrosity up, as a sacrifice to Timmy Toady (TMTOWTDI): `$/ = undef or $data = <> and $data =~ s{ ^TABLE;([^;\n]+).$ \| ^COLUMN;(.+?);.$ }{$1 ? (($t=$1),"") : "$t.$2"}mexg; print $data;` [download] Or condensed to uglier form: `perl -e '$/=undef or $_=<> and s{^TABLE;([^;\n]+).$\|^COLUMN;(.+?);.$ +}{$1?(($t=$1),""):"$t.$2"}mexg and print;' data.txt` [download] Update - golfed `perl -e'local$/or$_=<>,s{^TABLE;(.+?)$\|^COLUMN;(.+?);.*$}{$1?(($t=$1), +""):"$t.$2"}mexg and print;' data.txt` [download] Wait.. was this in the SoPW or obfu section? :)	[reply] [d/l] [select]
Re: regular expression to get the tablename.column_name by ikegami (Patriarch) on Oct 18, 2006 at 03:56 UTC
Hardly the shortest, but below is a one line version obtained using very basic simplifications of your code (`$_ =~` eliminated. `while (<STDIN>)` replaced with the `n` command line option. `$` and `\n` replaced with the `l` command line .) `perl -nle "/TABLE;(.)/?($t=$1):/COLUMN;([^;])/&&print qq{$t.$1}"` [download]	[reply] [d/l] [select]
Re^2: regular expression to get the tablename.column_name by greatshots (Pilgrim) on Oct 18, 2006 at 04:04 UTC
`perl -nle "/TABLE;(.)/?($t=$1):/COLUMN;([^;])/&&print(qq{$t.$1})" fi +le1.txt` [download] compli error `syntax error at -e line 1, near "(=" Execution of -e aborted due to compilation errors.` [download]	[reply] [d/l] [select]
Re^3: regular expression to get the tablename.column_name by chargrill (Parson) on Oct 18, 2006 at 04:08 UTC
If you're on windows, use double quotes. nix variant, single quotes. You obviously want: `perl -nle '/TABLE;(.)/?($t=$1):/COLUMN;([^;])/&&print(qq{$t.$1})' fi +le1.txt` [download] --chargrill `slil; $=join'',sort split q; s;.;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$,$/)` [download]	[reply] [d/l] [select]
Re: regular expression to get the tablename.column_name by imp (Priest) on Oct 18, 2006 at 03:51 UTC
Trying to condense the code to a single line is really only useful only for satisfying curiosity, as code that is condensed like that is harder to maintain. That said, here's one line of perl for it: `perl -ne '/^TABLE;([^;\n]+)/ && ($table = $1) or /^COLUMN;([^;\n]+)/ & +& print "$table.$1\n"' data.txt` [download] And here's an awk solution `awk -F ';' '/^TABLE/ {table=$2} /^COLUMN/ {printf("%s.%s\n",table,$2) + }' data.txt` [download]	[reply] [d/l] [select]
Re: regular expression to get the tablename.column_name by Cristoforo (Curate) on Oct 18, 2006 at 04:04 UTC
I was able to do this with and and or but it wouldn't permit using && or \|\|. It complained Can't modify logical and (&&) in scalar assignment at -e line 1, at EOF. I don't understand this. `perl -F/;/ -lane "/TABLE;(.+)/ and $t=$1 or print qq($t.$F[1])" o33.tx +t` [download] Chris Update: Thanks for the explanation ikegami.	[reply] [d/l]
Re^2: regular expression to get the tablename.column_name by ikegami (Patriarch) on Oct 18, 2006 at 04:11 UTC
Operator precedence `a && b=c \|\| d === (a && b) = (c \|\| d) a and b=c or d === (a and (b=c)) or d` [download]	[reply] [d/l]
Re^2: regular expression to get the tablename.column_name by greatshots (Pilgrim) on Oct 18, 2006 at 04:08 UTC
`perl -F/;/ -lane "/TABLE;(.+)/ and $t=$1 or print qq($t.$F[1])" o33.tx +t` [download] Oops !! still I did not get the prompt. keeps on running	[reply] [d/l]
Re^3: regular expression to get the tablename.column_name by ikegami (Patriarch) on Oct 18, 2006 at 04:14 UTC
Unlike the OP's version, your version won't handle blank lines or lines other than TABLE and COLUMN properly. If that's acceptable, then the following is even shorter: `perl -nle "/TABLE;(.)/?($t=$1):/;(.);/&&print qq($t.$1)" o33.txt` [download]	[reply] [d/l]
Re: regular expression to get the tablename.column_name by graff (Chancellor) on Oct 19, 2006 at 01:51 UTC
I have 10 lines of perl script to do that. but, interested to do it in sigle line. I haven't tried comparing the number of characters per script, but here's how I would do the perl one-liner: `perl -pe 's/TABLE;(.)\n// and $t=$1;s/COLUMN;(\w+);./$t.$1/'` [download] Given your 30 lines of sample input for one file, that produces 29 lines of output -- i.e. all the lines that start with "COLUMN" come out with the table name in place of "COLUMN", and the data-type of the field stripped off. To do that on a directory of 450 files, I'll guess that it would be okay to make a sister directory called "fld-lists", and make a copy of each file using a shell command sequence like this: `mkdir ../fld-lists for i in ; do perl -pe 's/TABLE;(.)\n// and $t=$1;s/COLUMN;(\w+);.*/$t.$1/' $i >.. +/fld-lists/$i done` [download]	[reply] [d/l] [select]