http://qs1969.pair.com?node_id=235566

nasa has asked for the wisdom of the Perl Monks concerning the following question:

I can see you people are trying to help me.
In the past I have confused you with nonworking code examples.
I will try to explain the problem with as little code included as possible.
I have a flat text file database called reff.data.
The database consists of two columns of ten digit numbers.
Son numbers Fathering numbers
1045316394 1045316144
1045316407 1045316394
1045316419 1045316407
1045316438 1045316419
1045316469 1045316394
1045316492 1045316407
1045316505 1045316492

The left column is the son of the right column in that all numbers in the
Left column were bought into the database by a number in the right column.
Note that in this database left numbers can never be the same but right numbers can.

As with children must be individuals but can have the same father,
I want to know which number fathered 1045316419 so I ask.

$data_file="reff.data"; open(DAT, $data_file) || die("Could not open file!"); @raw_data=<DAT>; @results2 = grep( /1045316419/, @rawdata); foreach( @results ) { $record = $_; # strip the beginning key $record =~ s/^\d+//; print "$record"; } close(I);

When I send this to the browser it will print out 1045316407 because that is indeed the
Corresponding number.
It will print 1045316407 because that is what $record represents.
Now I need to step back again and ask what number fathered that number (1045316407). But I don’t really have the number in my hand I have it as $record.
If I use the same script and ask perl to do the same search for $record and give me the corresponding number
It will return exactly the same result ( no good).
I need to step back several times, asking for the number that fathered the number before it
but I only have one number to start with.
Nasa.

Replies are listed 'Best First'.
Re: Database problem.
by pfaut (Priest) on Feb 15, 2003 at 14:55 UTC

    Since the numbers in the left hand column are unique and are the keys to your searches, you could load the data file into a hash instead of an array. Your answer then comes from a direct hash lookup rather than having to search.

    #!/usr/bin/perl -w use strict; my %kids=map { split } <DATA>; print "Father of $_ is $kids{$_}\n" for @ARGV; __DATA__ 1045316394 1045316144 1045316407 1045316394 1045316419 1045316407 1045316438 1045316419 1045316469 1045316394 1045316492 1045316407 1045316505 1045316492

    Example usage:

    $ perl fatherson.pl 1045316419 1045316492 Father of 1045316419 is 1045316407 Father of 1045316492 is 1045316407

    Of course, you should add some code to do something sensible if the requested child is not found.

    Update: You want the full lineage. You could do something like this.

    sub print_lineage { my ($x) = @_; my @p; while (exists $kids{$x}) { $x = $kids{$x}; push @p,$x; } if (@p) { local ($") = ", "; print "Lineage of $_[0] is @p\n"; } else { print "Lineage of $_[0] is unknown\n"; } } print_lineage $_ for @ARGV;

    Which produces:

    $ perl fatherson.pl 1045316419 1045316469 Lineage of 1045316419 is 1045316407, 1045316394, 1045316144 Lineage of 1045316469 is 1045316394, 1045316144
    --- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';
Re: Correction to Database problem
by tachyon (Chancellor) on Feb 15, 2003 at 15:41 UTC

    Sounds like what you need to do is use a hash:

    # build a hash with son => father as key => value pairs my %sons; while(<DATA>) { chomp; next unless $_; my ($son, $father) = split /\s+/, $_; $sons{$son} = $father; } # make another hash keyed on the fathers that includes all their sons my %fathers; for my $son ( keys %sons ) { # find the father or $son from out %sons hash my $father = $sons{$son}; push @{$fathers{$father}}, $son; } # show the results: print "Son\tFather\n"; for my $son ( keys %sons ) { print $son, "\t", $sons{$son}, "\n"; } print "\n\nFather\tSon(s)\n"; for my $father ( keys %fathers ) { my @sons = @{$fathers{$father}}; my $sons = join " | ", @sons; print $father, "\t\t", $sons, "\n"; } __DATA__ 1 1 2 2 3 3 4 1 5 3 6 4 7 5 8 1 9 2

    This will give you the following (which I presume is along the lines of what you are trying to do.

    Son Father 1 1 2 2 3 3 4 1 5 3 6 4 7 5 8 1 9 2 Father Son(s) 1 1 | 4 | 8 2 2 | 9 3 3 | 5 4 6 5 7

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Correction to Database problem
by demerphq (Chancellor) on Feb 15, 2003 at 22:50 UTC

    Hi nasa,

    What you are doing is a fairly classical data structure manipulation. You have a tree in list form. Or a list of node id's and their parent id's. The terms you have used are little different, usually we say that we have a parent-child relationship, and we dont talk about numbers, but rather id's or identifiers. I say all this because in future questions you may find the terms useful and easier to explain what you want to do. Anyway, here is yet another soluton of your problem.

    Potentially you dont need to read the whole list first. This would be true if and only if every parent had a lower id of all of its children and and every "son" id in the list was greater in id than all of the preceding son id's.

    Depending on what types of operations you will frequently need to perform you may wish to generate a more complicated data structure than a simple parent list. (which is what your data represents and what the hash does). You may wish to do some thing like this:

    which builds a tree of nodes with previous pointer and children pointers and provides a couple of utility subs for finding the children (in different forms), printing out the data structure and the like. A more robust, but probably too confusing approach for right now would be to use real references instead of id lookups (actually on second thought using pure references might be less confusing, but for now whatever :-). This would have the advantage that if you had operations like delete in mind that you could delete entire subtrees with a single assignment.

    Anyway, hopefully you find this node and its contents a useful introduction into different ways of representing heirarchical relationships, and operations that can be performed on them. I hope however that you will take the time to search on "Tree representation" and "Trees" for tree specific material (Binary Tree's are the canonical data structure in many respects) and obtain a book like "Algorithms and Data Structures in Perl" (O'Reiley). This will walk you through the different forms of representations in much more depth and precision.

    You will need to look under "Graphs" and "Graph theory" if you want to research up on general material related to this. To computer science people and mathematicians all of this stuff comes under the heading "Graph Theory". A tree with only parent pointers or only child pointer is technically called a "DAG", directed acyclic graph, and when it has both parent and child pointers it is a "Directed Graph". There are also other names originating in specific subfields, such as databases (where they talk about keys and constrained indexs), networking (a graph is a network amd vice versa), or various forms of system programming where nodes, pointers and children are the terms used, or graphics where they use lots of different terms (its not my field :-))

    Cheers and good luck. And definately dont put off asking a question about any of this if you feel the slightest confusion. If something is over your head we will do our best not to let you drown. :-)

    ---
    demerphq


Re: Correction to Database problem
by jasonk (Parson) on Feb 15, 2003 at 15:25 UTC

    So, it sounds to me like you have a 'son' number, and you want to find the 'father' of the son, and the father of the father (the sons grandfather you could say).

    The code you have will work in your second case, if you just put the variable instead of the number, but there are several more elegant methods you could use, this is how I would probably do it...

    sub father_of { my $find = shift; my $data_file = "reff.data"; open(DAT, $data_file) || die "Could not open file: $!"; foreach(<DAT>) { chomp; my($son,$father) = split; if($son == $find) { return $father; } } close(DAT); } my $son = 1045316419; my $father = father_of($son); my $grandfather = father_of($father); print "The son is $son\n"; print "The father of $son is $father\n"; print "The father of $father is $grandfather\n";

    This has a couple of advantages, you have reusable code rather than duplicating the search code every time, and it doesn't read the entire file into memory,

    If you did want to read the whole file into memory, storing it as a has makes it easier to use:

    open(DAT,$data_file) || die "Couldn't open file": foreach(<DAT>) { chomp; my($son,$father) = split; $fathers{$son} = $father; } close(DAT); my $son = 1045316419; my $father = $fathers{$son}; my $grandfather = $fathers{$father};

    (See perldata for more info on hashes)

Re: Database problem.
by zengargoyle (Deacon) on Feb 15, 2003 at 15:17 UTC

    how about something like...

    use strict; use warnings; $|++; my $person = shift @ARGV or die "Usage: $0 <personid>\n"; my %parent_of; my %children_of; while (my ($child, $parent) = split /\s/, <DATA>) { $parent_of{$child} = $parent; push @{$children_of{$parent}}, $child; } my @fathers; for (my $lookat = $person; exists $parent_of{$lookat}; $lookat = $pare +nt_of{$lookat}) { push @fathers, $parent_of{$lookat}; } my @sons = @{$children_of{$person}}; print "fathers of $person: @fathers$/"; print "sons of $person: @sons$/"; __DATA__ 1045316394 1045316144 1045316407 1045316394 1045316419 1045316407 1045316438 1045316419 1045316469 1045316394 1045316492 1045316407 1045316505 1045316492
    $ perl test.pl 1045316419
    fathers of 1045316419: 1045316407 1045316394 1045316144
    sons of 1045316419: 1045316438
    

    this had bettern not be homework!

Re: Correction to Database problem
by toma (Vicar) on Feb 15, 2003 at 23:16 UTC
    use Table::ParentChild; use File::Slurp; my @relationships; my $i=0; foreach my $line (read_file("data")) { my ($child, $parent)= split ' ',$line; $relationships[$i++]= [$parent, $child]; } my $table = new Table::ParentChild( \@relationships ); my $child= 1045316419; while (my @parents = $table->parent_lookup( $child )) { print "Parent of $child is ", $parents[0],"\n"; $child= $parents[0]; }

    Table::ParentChild is an XS module that you will need to build yourself. It is not available from ActiveState last I checked.

    It should work perfectly the first time! - toma

Re: Correction to Database problem
by ilcylic (Scribe) on Feb 16, 2003 at 01:13 UTC
    Perhaps I'm misunderstanding, or I haven't got all the details regarding system specifications or other such things that would explain why this approach is warranted, but wouldn't a relational database be an appropriate way to approach this problem? My apologies if this has been covered in an earlier post.

    -Il Cylic

    Section 66: Cruel, unjust, and dedicated to death.

      This would be real easy with a database like Oracle that supports hierarchical queries. In other databases, it would result in a query per generation which wouldn't be too efficient.

      --- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';