3dbc has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

My HoH looks something like this: (exported with Data::Dumper)

'14194 ' => { 'city' => 'Denver + ' , 'id' => '14194 ', 'name' => 'Christopher ', 'state' => 'CO' }, '11347 ' => { 'city' => 'Tannersville + ' , 'id' => '11347 ', 'name' => 'Brian ', 'state' => 'NY' } };


As you can see I have trailing white spaces all over the place proliferated throughout the HoH. They're in the id, name and city, etc.. I want to remove all trailing whitespace, especially for the id, so what's the best way to clean up this messy data structure, specifically the key?

FYI, I'm fetching the data from MSSQL via DBI using the following: $data = $dbh->selectall_hashref($mssql, "id"); where $mssql is some sql that gets me the data required.

So my theory is that the data is improperly defined on the database itself, which leads to all these subtle nuances. Should I build SQL SP to clean it up before I extract it or is it easier to let perl practically extract it and list the rubbish in a readable and dereferancable format, without any obfuscation?



Thanks for your brotherly wisdom almighty Monastery!

-3dbc

Replies are listed 'Best First'.
Re: HoH keys with trailing white spaces ????
by keszler (Priest) on Dec 10, 2011 at 14:13 UTC
      I love this part "The default is false (although it is possible that the default may change)"

      So simple; $dbh->{ChopBlanks} = 1; works great. Thank you!

      No need for any more code to reinvent the wheel, just a simple & silly little DBI setting...
Re: HoH keys with trailing white spaces ????
by JavaFan (Canon) on Dec 10, 2011 at 18:36 UTC
    So my theory is that the data is improperly defined on the database itself, which leads to all these subtle nuances.
    My guess is that the fields in the table are of type CHAR(N). This pads strings that are too short. You could alter the table so the fields become VARCHAR(N). I don't think the change in schema will remove trailing spaces, but it will prevent spaces to be added the next time.
      select column_name, data_type, character_maximum_length from informati +on_schema.columns where table_name = 'grasshopper' and column_name like '%id%'

      id char 10
        You sound like a grasshopper, who prefers to be a grasshopper all his life.

        I have of course no idea whether changing the type of the column benefits your company. And with your attitude, neither will you.

        What's the worst that can happen? That you find out why the column has type CHAR(N) instead of VARCHAR(N)?

Re: HoH keys with trailing white spaces ????
by TJPride (Pilgrim) on Dec 10, 2011 at 14:42 UTC
    use strict; use warnings; use Data::Dumper; my $hash = { 'x ' => { 'x ' => ' dddd', 'y ' => 'dsds ' }, 'y ' => [ 'dsfds ', ' dsf', ], ' z' => 'dsfds' }; trimHash($hash); print Dumper($hash); sub trimHash { my $h = $_[0]; my ($k1, $k2); for $k1 (keys %$h) { if (ref $h->{$k1} eq 'HASH') { trimHash($h->{$k1}); } elsif (ref $h->{$k1} eq 'ARRAY') { trimArray($h->{$k1}); } else { $h->{$k1} = trim($h->{$k1}); } $k2 = trim($k1); if ($k2 ne $k1) { $h->{$k2} = $h->{$k1}; delete $h->{$k1}; } } } sub trimArray { for (@{$_[0]}) { if (ref $_ eq 'HASH') { trimHash($_); } elsif (ref $_ eq 'ARRAY') { trimArray($_); } else { $_ = trim($_); } } } sub trim { my $s = $_[0]; $s =~ s/\s\s+/ /g; $s =~ s/^ | $//g; return $s; }
Re: HoH keys with trailing white spaces ????
by graff (Chancellor) on Dec 13, 2011 at 04:29 UTC
    I've never worked with MSSQL, but other database engines (e.g. MySQL) provide a function called "trim", which returns the value of a given string with leading and trailing whitespace removed. So if your existing SQL statement is something like this to select three string fields:
    select foo, bar, baz from some_table
    you could phrase it like this, assuming the MSSQL provides the "trim" function:
    select trim(foo), trim(bar), trim(baz) from some_table
      Another good idea, less perl centric, but effective nonetheless, however for mssql there's only ltrim and rtrim and in this case I think it would be rtrim, which I happen to use a lot with this particular database and should've thought of that strait away. Regardless, I'm happy I learned about the chopblanks setting, albeit I still think that should be the default...