Win has asked for the wisdom of the Perl Monks concerning the following question:

Could someone please offer me an alternative Perl solution to the following T-SQL code. I ask because the attempted solution below does not quite work as expected. I am trying to fill the Random_region_lookup_table_TEMP table. This table offers random subsets of geographic regions (to a fixed number) and this is repeated x number of times. The value of max(x) is the total number of generations. Each generation has the same number of regions. Each region subset is pulled from the same total set. No region is represented more than once in a single generation. Regions can be represented in multiple generations.
DECLARE @Number_of_government_regions INT SET @Number_of_government_regions = (SELECT 354) if exists(select 1 from INFORMATION_SCHEMA.tables where table_name = ' +MyNumbers') DROP TABLE MyNumbers; --MyNumbers --===== Create and populate the Tally table on the fly SELECT TOP 1000 IDENTITY(INT,1,1) AS Nums INTO dbo.MyNumbers FROM Master.dbo.SysColumns sc1, Master.dbo.SysColumns sc2 --===== Add a Primary Key to maximize performance ALTER TABLE dbo.MyNumbers ADD CONSTRAINT PK_MyNumbers_N PRIMARY KEY CLUSTERED (Nums) WITH FILLFACTOR = 100 INSERT INTO Random_region_lookup_table_TEMP (Generation_number, Place +_key) SELECT n.Nums, r.Number_count FROM MyNumbers n CROSS JOIN Region_lookup r WHERE n.Nums <= @Number_of_repeats ORDER BY n.Nums, NEWID() UPDATE Random_region_lookup_table_TEMP SET Place_key = (Place_key)%354 + 1 INSERT INTO Random_region_lookup_table_TEMP (Generation_number, Place_ +key) SELECT n.Nums, r.Number_count FROM MyNumbers n CROSS JOIN Region_lookup r ORDER BY n.Nums, NEWID()

Replies are listed 'Best First'.
Re: Dealing with random subsets
by roboticus (Chancellor) on Nov 29, 2007 at 18:43 UTC
    Win:

    OK, here's a quick way to do it in Perl ... feel free to edit it into shape:

    #/usr/bin/perl -w use strict; use warnings; use DBI: my $DBH=DBI->connect("dbi:ODBC:driver={SQL Server};" ."SERVER=XXXX; DATABASE=YYYY;","UID","PWD") or die $DBI::errstr; $DBH->do(qq| DECLARE @Number_of_government_regions INT SET @Number_of_government_regions = (SELECT 354) if exists(select 1 from INFORMATION_SCHEMA.tables where table_name = ' +MyNumbers') DROP TABLE MyNumbers; --MyNumbers --===== Create and populate the Tally table on the fly SELECT TOP 1000 IDENTITY(INT,1,1) AS Nums INTO dbo.MyNumbers FROM Master.dbo.SysColumns sc1, Master.dbo.SysColumns sc2 --===== Add a Primary Key to maximize performance ALTER TABLE dbo.MyNumbers ADD CONSTRAINT PK_MyNumbers_N PRIMARY KEY CLUSTERED (Nums) WITH FILLFACTOR = 100 INSERT INTO Random_region_lookup_table_TEMP (Generation_number, Place +_key) SELECT n.Nums, r.Number_count FROM MyNumbers n CROSS JOIN Region_lookup r WHERE n.Nums <= @Number_of_repeats ORDER BY n.Nums, NEWID() UPDATE Random_region_lookup_table_TEMP SET Place_key = (Place_key)%354 + 1 INSERT INTO Random_region_lookup_table_TEMP (Generation_number, Place_ +key) SELECT n.Nums, r.Number_count FROM MyNumbers n CROSS JOIN Region_lookup r ORDER BY n.Nums, NEWID() |) or die $DBI::errstr;
    Heh ... hope this helps!

    </snarky_mode>

    ...roboticus

Re: Dealing with random subsets
by pc88mxer (Vicar) on Nov 29, 2007 at 22:30 UTC
    Is this what you're looking for? This will generate the subsets on the perl side and populate the table Random_region_lookup_table_TEMP. Of course, this code is totally untested.
    my $nregions = 354; sub random_subset { my ($n, $k) = @_; # $k member subset of 1..$n. my %member; while ($k > 0) { my $x = int(rand()*$n)+1; # generates random number 1..$n. redo if $member{$x}; $member{$x} = 1; } continue { $k--; } sort keys %member; } sub insert_subset { my ($dbh, $generation, @members) = @_; my $sth = $dbh->prepare("INSERT INTO Random_region_lookup_table_TEMP + (Generation_number, Place_key) VALUES (?,?)"); for my $x (@members) { $sth->execute($generation, $x); } } ... my $dbh = DBI->connect(...); # fill in your connection parameters here ... my %seen; for my $generation (1..10) { my @members = random_subset($nregions, 5); # 5 element subsets, e.g. my $key = join(" ", @members); # note: members already sorted redo if $seen{$key}; # make sure the subsets are unique $seen{$key} = 1; insert_subset($dbh, $generation, @members); }
      Why prepare the same sql statement 10 times (or however many iterations are given for "$generation")?
      sub insert_subset { my ($sth, $generation, @members) = @_; for my $x (@members) { $sth->execute($generation, $x); } } my $dbh = DBI->connect(...); # fill in your connection parameters here my $sth = $dbh->prepare("INSERT INTO Random_region_lookup_table_TEMP ( +Generation_number, Place_key) VALUES (?,?)"); ... my %seen; for my $generation (1..10) { my @members = random_subset($nregions, 5); # 5 element subsets, e.g. my $key = join(" ", @members); # note: members already sorted redo if $seen{$key}; # make sure the subsets are unique $seen{$key} = 1; insert_subset($sth, $generation, @members); }
      A reply falls below the community's threshold of quality. You may see it by logging in.