Re: storing huge graph
by Corion (Patriarch) on Aug 08, 2007 at 07:18 UTC
|
I'm not so sure that you necessarily will have a memory problem - the 600 bytes payload for your 172800 nodes amounts to 104MB. If we (generously) double that amount to estimate the memory Perl will need per node, that brings us to roughly 200MB. Depending on how interconnected your graph will be, you'll have to estimate the number of edges, but with 2GB of RAM you will have plenty of memory to go by ...
| [reply] |
|
|
i was also hoping on a method to store them on disk...
i thought of something but it's quite amateurish.
like store nodes 1-1000 in a file called 1-1000.txt
then store 1000-2000 in a file 1000-2000.txt etc
and then when i need a certain node i load up that
file into memory containing the nodes needed.
altough i think that this may mean is should implement
all the graph class.
is there an efficient way to store them on disk ?
| [reply] |
|
|
It depends on what you mean by 'efficient': small or fast. TMTOWTDI: generally speaking the most efficent size to be loaded on any (virtual memory) system is a page, but... The size of a page varies between systems, on 32-bit Intel the page-size (Windoze or Linux) is 4kb. The paging manager in the Operating System loads code and data pages from the page (or swap) file. You can take advantage of this by using a memory mapped file (File Mapping, or Section, Objects on Windows), and on-demand page loading, which most OS's support. Using memory mapping (see IPC::Mmap on CPAN) you should then be able to store the whole lot in one huge file, but the underlying system should only load the pages needed as required.
| [reply] |
Re: storing huge graph
by eyepopslikeamosquito (Archbishop) on Aug 08, 2007 at 08:41 UTC
|
As for how to do graphs in Perl, searching for Graph on the CPAN turns up a number of modules, notably Graph by Jarkko Hietaniemi. Graphs are also discussed at length in the O'Reilly book Mastering Algorithms with Perl by Jon Orwant, Jarkko Hietaniemi and John Macdonald.
| [reply] |
Re: storing huge graph
by roboticus (Chancellor) on Aug 08, 2007 at 12:49 UTC
|
spx2:
Ummm .... the definition of big has changed over the years. The graph size you mention used to be big, but now it's really not all that large. Before adding overhead, the graph is just 600*172800 bytes, just over 100MB. Double that for overhead, and you're still well within the typical virtual memory size (normally 'bout 2GB on an x86).
So if you actually exceed the physical memory size of your computer, then the OS will handle the shuffling of virtual/physical pages for you. Your problem will have to be nearly 20x larger to worry about funky techniques. So don't worry about it until you actually run out of memory!
Now in the event that your edges are large and plentiful and you actually approach the virtual memory size of your machine, then you can use a file of fixed record sizes, and for the node data just store the node number (record number). You can cache 'n' of 'em in memory so you don't spend all your time on Disk I/O. (Generalizing to variable-record length records is a little harder, but basically the same.)
Using 'Tie' to connect your file to the nodes is probably a good idea, but I don't know having never used it before.
...roboticus
| [reply] |
|
|
i would like to also mention that i am thinking
of using sqlite to store the graph or any other database
and have a column with adiacent_nodes wich would be TEXT
and in wich i would store the id's of the nodes that are
adiacent in the current node.like
"34,2365,5473,2365,5432"
and then use a @nodes = split ",",$node->adiacent_nodes;
what do you think about this ?
is this convenient ?
has anyone ever used a graph in combination with a database?
if so can i find somewhere examples of this kind ?
i can handle it but i need to know what others have done.
| [reply] [d/l] |
|
|
spx2:
In fact, I do have a bit of experience putting graphs in databases. (I'm currently working on such a project, in fact.)
I don't like the idea of a column having a list of neighbors, though, because it requires you to effectively "jump through hoops" to actually use the data. As you mention, you'll be using split to pull it apart. Similarly, you don't want to have a set of columns to hold your edges, because it limits you to a certain number of edges, and also makes your queries much uglier. In either case, if you actually wanted to write an SQL query to find connect nodes through edges, you'll be fighting the database every step of the way.
There's a much better way: A common idiom in databases is to use a separate table to represent a many-to-many relationship1. Sure, it may take a bit more space in your database, but it's very flexible, and you can do many more queries using it.
So if you go to the effort of putting the graph into a database, make sure you take advantage of as many database features as you can to simplify the job--Don't just use it as a "super-big virtual hash".
...roboticus
| [reply] [d/l] [select] |
|
|
Re: storing huge graph
by BrowserUk (Patriarch) on Aug 08, 2007 at 12:56 UTC
|
If you just want to persist the graph between runs, Storable may be sufficient for your needs.
If the intention was to only load the bits you need as you need them, it would get very slow demand loading nodes if you need to traverse the graph at any point.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |