in reply to storing huge graph

spx2:

Ummm .... the definition of big has changed over the years. The graph size you mention used to be big, but now it's really not all that large. Before adding overhead, the graph is just 600*172800 bytes, just over 100MB. Double that for overhead, and you're still well within the typical virtual memory size (normally 'bout 2GB on an x86).

So if you actually exceed the physical memory size of your computer, then the OS will handle the shuffling of virtual/physical pages for you. Your problem will have to be nearly 20x larger to worry about funky techniques. So don't worry about it until you actually run out of memory!

Now in the event that your edges are large and plentiful and you actually approach the virtual memory size of your machine, then you can use a file of fixed record sizes, and for the node data just store the node number (record number). You can cache 'n' of 'em in memory so you don't spend all your time on Disk I/O. (Generalizing to variable-record length records is a little harder, but basically the same.)

Using 'Tie' to connect your file to the nodes is probably a good idea, but I don't know having never used it before.

...roboticus

Replies are listed 'Best First'.
Re^2: storing huge graph
by spx2 (Deacon) on Aug 10, 2007 at 00:54 UTC

    i would like to also mention that i am thinking of using sqlite to store the graph or any other database and have a column with adiacent_nodes wich would be TEXT and in wich i would store the id's of the nodes that are adiacent in the current node.like "34,2365,5473,2365,5432" and then use a @nodes = split ",",$node->adiacent_nodes; what do you think about this ?

    is this convenient ? has anyone ever used a graph in combination with a database? if so can i find somewhere examples of this kind ? i can handle it but i need to know what others have done.

      spx2:

      In fact, I do have a bit of experience putting graphs in databases. (I'm currently working on such a project, in fact.)

      I don't like the idea of a column having a list of neighbors, though, because it requires you to effectively "jump through hoops" to actually use the data. As you mention, you'll be using split to pull it apart. Similarly, you don't want to have a set of columns to hold your edges, because it limits you to a certain number of edges, and also makes your queries much uglier. In either case, if you actually wanted to write an SQL query to find connect nodes through edges, you'll be fighting the database every step of the way.

      There's a much better way: A common idiom in databases is to use a separate table to represent a many-to-many relationship1. Sure, it may take a bit more space in your database, but it's very flexible, and you can do many more queries using it.

      So if you go to the effort of putting the graph into a database, make sure you take advantage of as many database features as you can to simplify the job--Don't just use it as a "super-big virtual hash".

      ...roboticus

        A virtual roboticus++ for 1 .. 20 as I can't do it for real. For a 'use a database' node that gives (much) more than just that advice. And especially for:

        So if you go to the effort of putting the ... into a database, make sure you take advantage of as many database features as you can to simplify the job--Don't just use it as a "super-big virtual hash".

        Substitute "anything" for the ellipses and this should become a new Monk's Quip.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.