Re: (dchetlin) Data in Hash

This is a great question, definite ++. Clearly stated problem, concise code, just all around beautiful. I wish people posted questions like this on clpm.

First, the answers to your questions:

It was sugessted that I return a reference to a hash, thus conserving some memory. Would that just be, return /%hash?

return \%hash. The backslash is the reference operator. And yes, you definitely should return a reference to a hash here.

When you return a hash, you're not really returning a hash. You say return %hash and perl says "Ok, he wants to return a hash. Let's see, was this function called in list or scalar context? If it was list context, I'll just take all of the keys and values of the hash, push each of them onto the stack, and return. That way the function will return a list, and if the caller assigned something to the return of the function like %hash = function() then that hash will be populated by that list that was returned, just like %hash = ('a','b','c','d'). If the function was called in scalar context, I'll do something really bizarre and return a scalar that looks something like '3/8' which shows the number of used buckets and the total number of buckets in the hash. Because, obviously, people use that information all the time."

As you can probably imagine, pushing every key and value of a hash onto the stack is quite inefficient. You've already built the hash once, you'd like to return that same hash, not make perl build you a whole new one. So you return a reference to the hash, and call the function like my $hash_ref1 = process($file1,$list1).

In any case, what kind of print statement can I insert to test the contents of each hash, as returned?

use Data::Dumper. It's the easiest way to take a look at the contents of any data structure. Simply `use' it at the top of your program, and then do print Dumper $hash_ref1; and Dumper will give you a nicely formatted look at the hash.

I can't seem to find any information regarding reading data contained in a hash into DBI? Oh yeah, the reason that I am using a hash is that I would like the keys of the hash to become the column headers in the database table that I create.

The general answer to "How do I store a hash in a database" is "Use Storable". Storable does binary serialization of Perl structures. You'll serialize your hash, and then add that string (the Storable representation of your hash) to the database.

In terms of general advice, I have a few things to point out:

Your method of testing for the correct number of arguments is not the best way. Instead of making sure that $ARGV[3] isn't equal to the empty string, why not just test to make sure @ARGV has exactly four arguments (or at least four arguments, depending on how flexible you want to be). I.e. if (@ARGV == 4) { ... } else { Print "Usage: ... " }
Your joins on the command-line arguments aren't doing anything useful. join operates on a list of things, putting them together in one string. You're only giving join the separator and one argument, so it's basically a no-op. It looks like that's OK, though, because you're expecting the arguments to be comma-separated already.
When you print a usage statement, it's often sensible to use $0 to reflect the way that the script was actually called. Also, in case you change the name of the script later, things won't get out of sync.
It's hard for me to tell what your eventual goal here is, so it's a little hard for me to advise on what to do next, but it seems to me that a couple of things should probably be changed. Firstly, you'll probably want to choose a more robust way of getting the functionality of exfields.pl than calling it using the magic open. Either set it up so it can be required, or copy the useful functions from it, or turn it into a module, etc. Secondly, I'm not convinced that the way you're structuring your hashes is going to make for easy or even possible merging. I'd restructure somewhere, but without having a better idea of what exfields does and what you want the stuff to look like in the DB, I can't suggest more than that.

-dlc

Comment on Re: (dchetlin) Data in Hash - DBI Select or Download Code

Replies are listed 'Best First'.
RE: Re: (dchetlin) Data in Hash - DBI by Limo (Scribe) on Oct 01, 2000 at 08:23 UTC
Thank you for your kind words as well as your advice. I wouldn't even be able to write BAD code, if it were not for you folks! Rather than post 800 lines of code that make up exfields.pl, I will post a synopsis below. Possibly, that will give you and others insight into my problem. "Extracts records (selectively if requested) from a NPA [my group's typical file format]or MAGMA [db schema; irrelevant here] table dumps, producing either the entire record or (optionally) specified fields in the given order. Parameters can be in any order. Output is to STDOUT, and can be piped into this program (for further selections), or into updated versions of exfields.pl or showtable.pl. The -s option is used to select records according to whether a regexp match exists within a -f specified field or within the entire record. The -e option is used, optionally, to specify the fieldlist and output field order. One major use of this program is to create a reduced table, whose output contains only selected rows and columns. Another is to transform table format or order." [download] Now, here's an example of the gunzipped file headers: #DFD ' ' #H SrcRtr ifIndex ifPRule ifDescr ifName ifType ifSpeed ifPhysAddress ifWscModPort ifWscPortName ifAlias ifWscPortVlan ifVlanState ifVlanName TPNativeVlan TPDynStatus TPEncapOpType ifCP ifStackLL ifStackUL cdpFlg cdpCacheCnt cdpCacheInfo ipN2MediaCnt ipN2MediaInfo atmMaxVpcs atmCfgVpcs atmMaxVccs atmCfgVccs atmInfo frActCnt frInACnt frInfo IpAddress IpMask IpSubNet ospfFlags ospfAreaInfo NumIpAddresses NumIpAddrUp CfgIpAddrUp CfgIpAddrStats NumLoopbacksUp CfgRtrPollUp CfgRtrLPollUp CfgStats CfgPoll CfgErrFlg CfgWarnFlg CfgIfMapType CfgIfSysID CfgPoP MonName CfgIfDescr CfgIfSpeed CfgIfIndex SPopRegion SPopType SPopUse SPopAllow SRtrRole DstExtent DPop DPopRegion DPopType DPopUse DPopAllow DPopCloud DstDev DstDevRole MediaDstInfo CfgIfType CfgIfServRole CfgIfPurpose CfgIfVpiA CfgIfDlciA CfgIfVpiZ CfgIfDlciZ CfgIfRemL2Dev CfgIfRemL2Port CfgIfVLAN FlowType RcIfFlags RcASN RcOspfs BgpLocalAS BgpExtnlAS RcBgpCfd RcBgpCfdPeers BgpRemASInfo OspfIfInfo RcPrtChan RcPrtChanList RcIfSpeed RcVpiVci RcAtmAalEP RcAtmPkbps RcAtmAkbps RcAtmBurst RcAtmPvcMapInfo RcFrDlci RcEncap IpOspfCost IpOspfPriority CldGroup CldName CldStatus CldRatio CldThresh CfgCid CirActive CirCIR CirSpeed CirOrdered CirOnLine CirOffLine ModBy ModOn RcText #F %16s %3d %5s %16s %s %3d %11.0f %17s %5s %11s %s %s %s %s %s %s %s %s %s %s %s %2d %s %2d %s %2s %2s %2s %2s %s %2s %2s %s %15s %15s %15s %3s %s %2d %1d %1d %1d %1d %1d %1d %1s %1s %5s %5s %3s %6s %-10s %-36s %16s %11.0f %3s %4s %2s %2s %7s %2s %-3s %-10s %4s %2s %2s %7s %20s %20s %2s %s %4s %4s %4s %4s %4s %4s %4s %15s %5s %5s %4s %4s %5s %5s %5s %5s %2s %s %s %s %s %s %11s %7s %12s %6s %6s %6s %s %5s %5s %5s %5s %20s %20s %10s %4s %4s %20s %5s %11s %11s %8s %8s %8s %10s %18s %s [download] The output of exfields looks EXACTLY as shown in my original post, depending on which fields I want to look at, of course. Basically, what I envision for the database is for it to contain my selected headers from file1 and file2 as column names; each column containing the corresponding data. Column 1 in the db would contain the primary key, which in my case, is a router name. Essentially, that is what the 2 files will have in common; different sets of data related to each router. What I need to do is grab a subset of router data from file1, a subset of router data from file2, and merge both subsets into file3, which will be `<STDOUT>`. Actually, here's a sample output of exfields: ./exfields.pl -e MonName,ifSpeed,DstDev pptop.20000921.gz nyc1-br2/nyc4-br2:1.t3 45045000 nyc4-br2 nyc1-br2/nyc4-br2:2.t3 45045000 nyc4-br2 nyc1-br2/nyc4-br2:3.t3 45045000 nyc4-br2 nyc1-cr1/nyc4-br1.t3 44210000 nyc4-br1 nyc1-br1/nyc2-cr1.t3 45045000 nyc2-cr1 nyc4-br2/cleveland1-br1.2.t3 45045000 cleveland1-br1 nyc4-br2/cleveland1-br2.t3 44210000 cleveland1-br2 nyc4-br2/nyc1-br2:1.t3 45045000 nyc1-br2 nyc4-br2/nyc1-br2:2.t3 45045000 nyc1-br2 nyc4-br2/nyc1-br2:3.t3 45045000 nyc1-br2 nyc4-nbr2/nycmny1-br1.oc12 622000000 nycmny1-br1 [download] Again, it is the field "MonName" that both file1 and file2 will have in common.I hope that some of this makes my problem a bit clearer to those willing to help out.	[reply] [d/l] [select]

Replies are listed 'Best First'.

RE: Re: (dchetlin) Data in Hash - DBI
by Limo (Scribe) on Oct 01, 2000 at 08:23 UTC

"Extracts records (selectively if requested) from a NPA 
[my group's typical file format]or MAGMA [db schema; 
irrelevant here] table dumps, producing either the 
entire record or (optionally) specified fields in the 
given order.  Parameters can be in any order.

Output is to STDOUT, and can be piped into this program 
(for further selections), or into updated versions of 
exfields.pl or showtable.pl.

The -s option is used to select records according to 
whether a regexp match exists within a -f specified 
field or within the entire record.

The -e option is used, optionally, to specify the 
fieldlist and output field order.

One major use of this program is to create a 
reduced table, whose output contains only selected 
rows and columns.  Another is to transform table format 
or order."
[download]

#DFD ' '

#H SrcRtr ifIndex ifPRule ifDescr ifName ifType ifSpeed 
ifPhysAddress ifWscModPort ifWscPortName ifAlias ifWscPortVlan 
ifVlanState ifVlanName TPNativeVlan TPDynStatus TPEncapOpType 
ifCP ifStackLL ifStackUL cdpFlg cdpCacheCnt cdpCacheInfo 
ipN2MediaCnt ipN2MediaInfo atmMaxVpcs atmCfgVpcs atmMaxVccs 
atmCfgVccs atmInfo frActCnt frInACnt frInfo IpAddress IpMask 
IpSubNet ospfFlags ospfAreaInfo NumIpAddresses NumIpAddrUp 
CfgIpAddrUp CfgIpAddrStats NumLoopbacksUp CfgRtrPollUp 
CfgRtrLPollUp CfgStats CfgPoll CfgErrFlg CfgWarnFlg CfgIfMapType 
CfgIfSysID CfgPoP MonName CfgIfDescr CfgIfSpeed CfgIfIndex 
SPopRegion SPopType SPopUse SPopAllow SRtrRole DstExtent DPop 
DPopRegion DPopType DPopUse DPopAllow DPopCloud DstDev DstDevRole
MediaDstInfo CfgIfType CfgIfServRole CfgIfPurpose CfgIfVpiA 
CfgIfDlciA CfgIfVpiZ CfgIfDlciZ CfgIfRemL2Dev CfgIfRemL2Port 
CfgIfVLAN FlowType RcIfFlags RcASN RcOspfs BgpLocalAS BgpExtnlAS 
RcBgpCfd RcBgpCfdPeers BgpRemASInfo OspfIfInfo RcPrtChan 
RcPrtChanList RcIfSpeed RcVpiVci RcAtmAalEP RcAtmPkbps RcAtmAkbps 
RcAtmBurst RcAtmPvcMapInfo RcFrDlci RcEncap IpOspfCost 
IpOspfPriority CldGroup CldName CldStatus CldRatio CldThresh 
CfgCid CirActive CirCIR CirSpeed CirOrdered CirOnLine CirOffLine 
ModBy ModOn RcText

#F %16s %3d %5s %16s %s %3d %11.0f %17s %5s %11s %s %s %s %s %s %s 
%s %s %s %s %s %2d %s %2d %s %2s %2s %2s %2s %s %2s %2s %s %15s 
%15s %15s %3s %s %2d %1d %1d %1d %1d %1d %1d %1s %1s %5s %5s %3s 
%6s %-10s %-36s %16s %11.0f %3s %4s %2s %2s %7s %2s %-3s %-10s %4s 
%2s %2s %7s %20s %20s %2s %s %4s %4s %4s %4s %4s %4s %4s %15s %5s 
%5s %4s %4s %5s %5s %5s %5s %2s %s %s %s %s %s %11s %7s %12s %6s 
%6s %6s %s %5s %5s %5s %5s %20s %20s %10s %4s %4s %20s %5s %11s 
%11s %8s %8s %8s %10s %18s %s
[download]

<STDOUT>

./exfields.pl -e MonName,ifSpeed,DstDev pptop.20000921.gz 

nyc1-br2/nyc4-br2:1.t3            45045000             nyc4-br2
nyc1-br2/nyc4-br2:2.t3            45045000             nyc4-br2
nyc1-br2/nyc4-br2:3.t3            45045000             nyc4-br2
nyc1-cr1/nyc4-br1.t3              44210000             nyc4-br1
nyc1-br1/nyc2-cr1.t3              45045000             nyc2-cr1
nyc4-br2/cleveland1-br1.2.t3      45045000       cleveland1-br1
nyc4-br2/cleveland1-br2.t3        44210000       cleveland1-br2
nyc4-br2/nyc1-br2:1.t3            45045000             nyc1-br2
nyc4-br2/nyc1-br2:2.t3            45045000             nyc1-br2
nyc4-br2/nyc1-br2:3.t3            45045000             nyc1-br2
nyc4-nbr2/nycmny1-br1.oc12        622000000          nycmny1-br1
[download]

[reply]
[d/l]
[select]