comment on

I have a collegue who is a proficient programmer but new to Perl and has been charged with producing a simple cross-tab from an SQL query in HTML (and potentially in future also as an Excel workbook or other format).

I decided to have a go at this myself to see how it would look. And it looks ghastly! I would dearly love a bit of a code critique before I present this as a good solution. I can't help feeling I must be missing some shortcut or CPAN module. (Note: I've considered and largely rejected DBIx::Xtab and it would take quite a stong case to make me consider it again).

The raw output of the SQL query looks a bit like this:

A B C D E F P Q

A1 B1 C1 D1 E1 F1 One Two

A1 B1 C1 D1 E1 F2 Five Six

A1 B1 C1 D2 E1 F1 Three Four

A1 B1 C2 D1 E1 F1 Seven Eight

A1 B1 C2 D1 E2 F1 Nine Ten

A1 B1 C3 D2 E3 F2 Eleven Twelve

A1 B2 C1 D1 E1 F1 Foo Bar

A1 B2 C1 D1 E1 F1 Floob Blab

A1 B2 C2 D2 E1 F1 Thirteen Fourteen

A1 B2 C2 D2 E2 F1 Fifteen Sixteen

A	B	C	D	E	F	P	Q
A1	B1	C1	D1	E1	F1	One	Two
A1	B1	C1	D1	E1	F2	Five	Six
A1	B1	C1	D2	E1	F1	Three	Four
A1	B1	C2	D1	E1	F1	Seven	Eight
A1	B1	C2	D1	E2	F1	Nine	Ten
A1	B1	C3	D2	E3	F2	Eleven	Twelve
A1	B2	C1	D1	E1	F1	Foo	Bar
A1	B2	C1	D1	E1	F1	Floob	Blab
A1	B2	C2	D2	E1	F1	Thirteen	Fourteen
A1	B2	C2	D2	E2	F1	Fifteen	Sixteen

And I'd like output a bit like

A: A1

B: B1

E E1 E2 E3

F F1 F2 F1 F2

C D P Q P Q P Q P Q

C1 D1 One Two

C1 D1 Five Six

D2 Three Four

C2 D1 Seven Eight Nine Ten

A: A1

B: B2

E E1 E2

F F1 F1

C D P Q P Q

C1 D1 Foo Floob Bar Blab

C2 D2 Thirteen Fourteen Fifteen Sixteen

My code looks like this

use strict;
use warnings;

sub first_diff {
    my ($a1,$a2) = @_;
    # Hmmm... do I really want to destinguish undef and '' here?
    # I don't do so in terms of the column headers.
    no warnings 'uninitialized';
    for ( 0..$#$a1 ) {
    return $_ if $_ > $#$a2 ||
        $a1->[$_] ne $a2->[$_] ||
        defined $a1->[$_] != defined $a2->[$_];
    }
    return;
}

sub quote_ids {
    join ',' => map qq{"$_"}, @_;
}

sub make_xtab {
    my %args = @_;
    my ( $db, $page_headers, $row_headers, $col_headers,
     $values ) =
         @args{'DB','PAGES','ROWS','COLS','VALUES'};

    my $page_info;
    {
    my $c = quote_ids(@$page_headers,@$col_headers);
    my $sth = $db->prepare
        ("SELECT DISTINCT $c\nFROM $args{QUERY} AS Q\nORDER BY $c");
    $sth->execute;

    my @counts;
    while ( my @data = $sth->fetchrow_array ) {
        # First find the page...
        my $r = \$page_info;
        $r = \$$r->{shift @data} for 1 .. @$page_headers;
        # ... then within page analyse the columns
        my $xtab_data_cols = \$$r->{COL_COUNT};
        my $headers = \@{$$r->{HEADERS}};
        $r = \$$r->{COL_INDEX}; # Index of headers to column
        my $i = 0;
        for ( @data ) {
        $r = \$$r->{$_};
        if ( $$r ) {
            # We've seen this before
            $headers->[$i][-1]{SPAN}++;
        } else {
            # First time we've seen this combination
            push @{$headers->[$i]} => {
            VALUE => $_,
            SPAN => 1,
            };
        }
        $i++;
        }
        $$r = $$xtab_data_cols++;
    }
    }

    my $sth = $db->prepare
    ("SELECT ".quote_ids(@$page_headers,@$row_headers,@$col_headers,@$
+values)."\n".
     "FROM $args{QUERY} AS Q\n".
     "ORDER BY ".quote_ids(@$page_headers,@$row_headers));
    $sth->execute;

    my ($xtab_data,@prev_row_header_values,$row_change_depth);
    my $this_page_info = 0;

    my $flush_xtab_data = sub {
    $args{EMIT_ROW}->(\%args,
             $row_change_depth,
             \@prev_row_header_values,
             $xtab_data)
        if $xtab_data;
    undef $xtab_data;
    };
  QUERY_ROW:
    while ( my @query_data = $sth->fetchrow_array ) {
    my @page_header_values = splice @query_data, 0, @$page_headers;
    {
        my $p = $page_info;
        $p = $p->{$_} for @page_header_values; 

        # Should never happen if we perform the two queries as a
        # single SQL transaction;
        next QUERY_ROW unless $p;

        unless ( $p == $this_page_info ) {
        $args{EMIT_FOOTER}->(\%args,$this_page_info)
            if $this_page_info;
        $this_page_info = $p;
        $flush_xtab_data->();
        @prev_row_header_values = ();
        $row_change_depth = 0;
        $args{EMIT_HEADER}->(\%args,$this_page_info,\@page_header_valu
+es);
        }
    }
    my @row_header_values = splice @query_data, 0, @$row_headers;

    if ( defined ( my $d = first_diff(\@row_header_values,\@prev_row_h
+eader_values))) {
        $flush_xtab_data->();
        $row_change_depth = $d;
        # print STDERR "@row_header_values - @prev_row_header_values\n
+";
        @prev_row_header_values = @row_header_values;
    }

    my $xtab_col = $this_page_info->{COL_INDEX};
    $xtab_col = $xtab_col->{shift @query_data} for 1..@$col_headers;

    # Should never happen if we perform the two queries as a
    # single SQL transaction;
    next QUERY_ROW unless defined $xtab_col;

    $xtab_data ||= [ map { [] } 1 .. $this_page_info->{COL_COUNT} * @$
+values ];

    $xtab_col *= @$values;
    for ( @$xtab_data[ $xtab_col .. $xtab_col+$#$values ] ) {
        push @$_ => shift @query_data;
    }
    } 
    $flush_xtab_data->();
    $args{EMIT_FOOTER}->(\%args,$this_page_info) 
    if $this_page_info;
}

sub emit_html_header {
    my ($args,$page_info,$header_values) = @_;
    $args->{ROW_NO}=0;
    $args->{HTML_BUFFER} = [];    
    require CGI;
    for (@{$args->{PAGES}}) {
    print "<div>$_: ",CGI->escapeHTML(shift @$header_values),"</div>\n
+";
    }
    print qq{<table border="1">\n};
    my ($row_headers,$col_headers,$values) = 
    @$args{'ROWS','COLS','VALUES'};
    my $print_values = @$values > 1;
    for ( 0 .. $#$col_headers ) {
    my $rowspan='';
    if ( !$print_values && $_ == $#$col_headers ) {
        $rowspan = ' rowspan="2"';
    }
    print '<tr><td colspan="'.
        scalar(@$row_headers).
        '">'.
        $col_headers->[$_].
        '</td>';
    for ( @{$page_info->{HEADERS}[$_]} ) {
        print '<td colspan="' . 
        $_->{SPAN} * @$values .
        '"'. $rowspan .'>' .
        CGI->escapeHTML($_->{VALUE}) . '</td>';
    }
    print "</tr>\n";
    }
    print '<tr>';
    for ( @$row_headers ) {
    print "<td>$_</td>";
    }
    if ( $print_values ) {
    for ( @{$page_info->{HEADERS}[-1]} ) {
        for ( @$values ) {
        print "<td>$_</td>";
        }
    }
    }
    print "</tr>\n";
}

sub emit_html_row {
    my ($args,$depth,$header_values,$data) = @_;
    my $buffer = $args->{HTML_BUFFER};
    my $row_no = ++$args->{ROW_NO};

    # @$buffer holds an array of partially built HTML at each level of
    # row header so that we can go back and fill in the rowspans after
    # the fact.  Yes this is ugly but avoids lots of large, complex
    # intermediate structures.

    if (@$buffer) {

    # I wish I could think of something more elegant than putting
    # the <tr> on the front then ripping it off again.

    my $html = '';
    for ( reverse splice @$buffer, $depth ) {
        $html = '<tr><td rowspan="' .
        ($row_no - $_->{START_ROW}) .
        '">'.
        CGI->escapeHTML($_->{HEADER_VALUE}).
        '</td>'.
        substr($_->{HTML} . $html ,4);
    }
    if ( $depth ) {
        $buffer->[-1]{HTML} .= $html;
    } else {
        print $html;
    }
    }
    
    return unless $header_values; # If called from emit_html_footer

    for ( @$header_values[@$buffer .. $#$header_values] ) {
    push @$buffer => {
        START_ROW => $row_no,
        HEADER_VALUE => $_,
        HTML => '',
    };
    }
    no warnings 'uninitialized';
    $buffer->[-1]{HTML} =
    '<tr>'.
    join('', 
         map { "<td>$_</td>" }
         map { length() ? $_ : '&nbsp;' }
         map { CGI->escapeHTML("@$_") } @$data
         ).
         "</tr>\n";
}

sub emit_html_footer {
    my ($args) = @_;
    emit_html_row($args,0);
    print "</table>\n";

}

require DBI;
# Assume $ENV{DBI_DSN} et al are set...
my $db = DBI->connect or die $DBI::errstr;
$db->{RaiseError} = 1;

make_xtab(
      DB => $db,
      QUERY => 'TEST_DATA',
      PAGES => ['A','B'],
      ROWS => ['C','D'],
      COLS => ['E','F'],
      VALUES => ['P','Q'],
      EMIT_ROW => \&emit_html_row,
      EMIT_HEADER => \&emit_html_header,
      EMIT_FOOTER => \&emit_html_footer,
      );
[download]

In reply to SQL to HTML cross-table code critique by nobull

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.

E		E1		E2
F		F1		F1
C	D	P	Q	P	Q
C1	D1	Foo Floob	Bar Blab
C2	D2	Thirteen	Fourteen	Fifteen	Sixteen