greatshots has asked for the wisdom of the Perl Monks concerning the following question:

monks,

TABLE;nokia_sgsn_tot_int_util_month COLUMN;nc_id;integer COLUMN;sgsn_id;varchar(50) not null COLUMN;month_of;integer not null COLUMN;nokia_sgsn_interface_utilisation_busy_hour;utime COLUMN;data_coverage_pc;float COLUMN;tot_measurement_seconds;integer COLUMN;avg_measurement_seconds;integer COLUMN;tot_ifinbroadcastpkts;int8 COLUMN;avg_ifinbroadcastpkts;int8 COLUMN;min_ifinbroadcastpkts;int8 COLUMN;max_ifinbroadcastpkts;int8 COLUMN;nsiubh_ifinbroadcastpkts;int8 COLUMN;tot_ifindiscards;int8 COLUMN;avg_ifindiscards;int8 COLUMN;min_ifindiscards;int8 COLUMN;max_ifindiscards;int8 COLUMN;nsiubh_ifindiscards;int8 COLUMN;tot_ifinerrors;int8 COLUMN;avg_ifinerrors;int8 COLUMN;min_ifinerrors;int8 COLUMN;max_ifinerrors;int8 COLUMN;nsiubh_ifinerrors;int8 COLUMN;tot_ifinmulticastpkts;int8 COLUMN;avg_ifinmulticastpkts;int8 COLUMN;min_ifinmulticastpkts;int8 COLUMN;max_ifinmulticastpkts;int8 COLUMN;nsiubh_ifinmulticastpkts;int8 COLUMN;tot_ifinnucastpkts;int8 COLUMN;avg_ifinnucastpkts;int8
In a directory I have 450 files. each file contains the table definitions as specified above. from these file I want get the output as specfied.
Using the above input
nokia_sgsn_tot_int_util_month.nc_id nokia_sgsn_tot_int_util_month.sgsn_id . . . TABLE.COLUMN

UpdateCode added
#!/usr/bin/perl while ( <STDIN> ) { if ( $_ =~ /TABLE;(.*$)/ ) { $table_name = $1; } if ( $_ =~ /COLUMN;(.*);/ ) { print $table_name.'.'.$1."\n"; } }
for aboove script I am passing a single file at a time. but I have to do this for 450 files. is that possible to do above task in a single line for 450 files ?
UPDATE I have 10 lines of perl script to do that. but, interested to do it in sigle line.

Replies are listed 'Best First'.
Re: regular expression to get the tablename.column_name
by greatshots (Pilgrim) on Oct 18, 2006 at 04:12 UTC
    time awk -F ';' '/^TABLE/ {table=$2} /^COLUMN/ {printf("%s.%s\n",tabl +e,$2) }' * real 0m0.35s user 0m0.29s sys 0m0.05s time perl -ne '/^TABLE;([^;\n]+)/ && ($table = $1) or /^COLUMN;([^;\n] ++)/ && print "$table.$1\n"' * real 0m0.24s user 0m0.21s sys 0m0.03s time perl -nle '/TABLE;(.*)/?($t=$1):/COLUMN;([^;]*)/&&print(qq{$t.$1} +)' * real 0m4.88s user 0m0.34s sys 0m0.05s
      To get a better idea of how each strategy performs you need to observe the results over many invocations. Here is a benchmark of a few strategies:
      use Benchmark qw(cmpthese); use strict; my @data = <DATA>; my $data = join "\n",@data; #open OUT, '>&STDOUT'; open OUT, '>', '/dev/null'; cmpthese(1000, { table_col => sub {open my $fh, '<', \$data; table_col($fh)}, col_table => sub {open my $fh, '<', \$data; col_table($fh)}, multi => sub {open my $fh, '<', \$data; multi($fh)}, regex => sub {open my $fh, '<', \$data; regex($fh)}, }); sub multi { my $fh = shift; my $table; while(<$fh>) { if (/^COLUMN;(.+?);/) { print OUT $table,$1,"\n"; } elsif (/^TABLE;(.+)$/) { $table = $1 . '.'; } } } sub col_table { my $fh = shift; my $table; while(<$fh>) { /^COLUMN;(.+?);/ && print OUT ($table,$1,"\n") or /^TABLE;(.+) +$/ && ($table = $1 . '.') } } sub table_col { my $fh = shift; my $table; while(<$fh>) { /^TABLE;(.+)$/ && ($table = $1 . '.') or /^COLUMN;(.+?);/ && p +rint OUT ($table,$1,"\n") } } sub regex { my $fh = shift; local $/; my $data = <$fh>; my $table; $data =~ s{ ^TABLE;(.+?)\s*\n | ^COLUMN;(.+?);.*\n }{$1 ? (($table=$1),"") : "$table.$2"}mexg; print OUT $data; } __DATA__ TABLE;nokia_sgsn_tot_int_util_month COLUMN;nc_id;integer COLUMN;sgsn_id;varchar(50) not null COLUMN;month_of;integer not null COLUMN;nokia_sgsn_interface_utilisation_busy_hour;utime COLUMN;data_coverage_pc;float COLUMN;tot_measurement_seconds;integer COLUMN;avg_measurement_seconds;integer COLUMN;tot_ifinbroadcastpkts;int8 COLUMN;avg_ifinbroadcastpkts;int8 COLUMN;min_ifinbroadcastpkts;int8 COLUMN;max_ifinbroadcastpkts;int8 COLUMN;nsiubh_ifinbroadcastpkts;int8 COLUMN;tot_ifindiscards;int8 COLUMN;avg_ifindiscards;int8 COLUMN;min_ifindiscards;int8 COLUMN;max_ifindiscards;int8 COLUMN;nsiubh_ifindiscards;int8 COLUMN;tot_ifinerrors;int8 COLUMN;avg_ifinerrors;int8 COLUMN;min_ifinerrors;int8 COLUMN;max_ifinerrors;int8 COLUMN;nsiubh_ifinerrors;int8 COLUMN;tot_ifinmulticastpkts;int8 COLUMN;avg_ifinmulticastpkts;int8 COLUMN;min_ifinmulticastpkts;int8 COLUMN;max_ifinmulticastpkts;int8 COLUMN;nsiubh_ifinmulticastpkts;int8 COLUMN;tot_ifinnucastpkts;int8 COLUMN;avg_ifinnucastpkts;int8
      Results:
      Rate col_table multi table_col regex col_table 565/s -- -10% -14% -36% multi 625/s 11% -- -5% -29% table_col 658/s 16% 5% -- -26% regex 885/s 57% 42% 35% --
      With a different set of test data you will of course get different results.

      I was expecting the col_table and multi strategies to do better than table_col, but they consistently perform worse for me.

      Do you want fast or short? Those are conflicting goals. Adding ^ makes a big diff.
        Obviously.. both
Re: regular expression to get the tablename.column_name - ugly and golfed
by imp (Priest) on Oct 18, 2006 at 04:19 UTC
    I offer the following monstrosity up, as a sacrifice to Timmy Toady (TMTOWTDI):
    $/ = undef or $data = <> and $data =~ s{ ^TABLE;([^;\n]+).*$ | ^COLUMN;(.+?);.*$ }{$1 ? (($t=$1),"") : "$t.$2"}mexg; print $data;
    Or condensed to uglier form:
    perl -e '$/=undef or $_=<> and s{^TABLE;([^;\n]+).*$|^COLUMN;(.+?);.*$ +}{$1?(($t=$1),""):"$t.$2"}mexg and print;' data.txt
    Update - golfed
    perl -e'local$/or$_=<>,s{^TABLE;(.+?)$|^COLUMN;(.+?);.*$}{$1?(($t=$1), +""):"$t.$2"}mexg and print;' data.txt
    Wait.. was this in the SoPW or obfu section? :)
Re: regular expression to get the tablename.column_name
by ikegami (Patriarch) on Oct 18, 2006 at 03:56 UTC
    Hardly the shortest, but below is a one line version obtained using very basic simplifications of your code ($_ =~ eliminated. while (<STDIN>) replaced with the n command line option. $ and \n replaced with the l command line .)
    perl -nle "/TABLE;(.*)/?($t=$1):/COLUMN;([^;]*)/&&print qq{$t.$1}"

      perl -nle "/TABLE;(.*)/?($t=$1):/COLUMN;([^;]*)/&&print(qq{$t.$1})" fi +le1.txt
      compli error
      syntax error at -e line 1, near "(=" Execution of -e aborted due to compilation errors.

        If you're on windows, use double quotes. *nix variant, single quotes. You obviously want:

        perl -nle '/TABLE;(.*)/?($t=$1):/COLUMN;([^;]*)/&&print(qq{$t.$1})' fi +le1.txt


        --chargrill
        s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
Re: regular expression to get the tablename.column_name
by imp (Priest) on Oct 18, 2006 at 03:51 UTC
    Trying to condense the code to a single line is really only useful only for satisfying curiosity, as code that is condensed like that is harder to maintain.

    That said, here's one line of perl for it:

    perl -ne '/^TABLE;([^;\n]+)/ && ($table = $1) or /^COLUMN;([^;\n]+)/ & +& print "$table.$1\n"' data.txt
    And here's an awk solution
    awk -F ';' '/^TABLE/ {table=$2} /^COLUMN/ {printf("%s.%s\n",table,$2) + }' data.txt
Re: regular expression to get the tablename.column_name
by Cristoforo (Curate) on Oct 18, 2006 at 04:04 UTC
    I was able to do this with and and or but it wouldn't permit using && or ||. It complained Can't modify logical and (&&) in scalar assignment at -e line 1, at EOF. I don't understand this.

    perl -F/;/ -lane "/TABLE;(.+)/ and $t=$1 or print qq($t.$F[1])" o33.tx +t
    Chris Update: Thanks for the explanation ikegami.

      perl -F/;/ -lane "/TABLE;(.+)/ and $t=$1 or print qq($t.$F[1])" o33.tx +t
      Oops !! still I did not get the prompt. keeps on running

        Unlike the OP's version, your version won't handle blank lines or lines other than TABLE and COLUMN properly. If that's acceptable, then the following is even shorter:
        perl -nle "/TABLE;(.*)/?($t=$1):/;(.*);/&&print qq($t.$1)" o33.txt
Re: regular expression to get the tablename.column_name
by graff (Chancellor) on Oct 19, 2006 at 01:51 UTC
    I have 10 lines of perl script to do that. but, interested to do it in sigle line.

    I haven't tried comparing the number of characters per script, but here's how I would do the perl one-liner:

    perl -pe 's/TABLE;(.*)\n// and $t=$1;s/COLUMN;(\w+);.*/$t.$1/'
    Given your 30 lines of sample input for one file, that produces 29 lines of output -- i.e. all the lines that start with "COLUMN" come out with the table name in place of "COLUMN", and the data-type of the field stripped off.

    To do that on a directory of 450 files, I'll guess that it would be okay to make a sister directory called "fld-lists", and make a copy of each file using a shell command sequence like this:

    mkdir ../fld-lists for i in *; do perl -pe 's/TABLE;(.*)\n// and $t=$1;s/COLUMN;(\w+);.*/$t.$1/' $i >.. +/fld-lists/$i done