Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Working with a huge Graph object

by sgifford (Prior)
on Dec 02, 2003 at 02:03 UTC ( [id://311488]=perlquestion: print w/replies, xml ) Need Help??

sgifford has asked for the wisdom of the Perl Monks concerning the following question:

We're trying to do graph operations on a large collection of Web pages and links. We've experimented on a small scale with the Graph module, but the data set we need to work with looks like it would take up about 25GB, which is (somewhat unsurprisingly) more memory than we have.

Does anybody have any suggestions for how we could make this work? For example, a disk-based implementation of the Graph class? Or could we create a 25GB file, mmap it, then tell Perl to use that as backing store?

Any suggestions are appreciated!

Replies are listed 'Best First'.
Re: Working with a huge Graph object
by Zaxo (Archbishop) on Dec 02, 2003 at 02:19 UTC

    The way you prune a Graph depends on what you want to do with it. If you only need to look at local properties, you can look at clusters around a single node. Frequently you can collapse a completely connected set of nodes into a single composite one. Another strategy is cook up a way to classify nodes by purpose and restrict the graph to a subset by class.

    What are you trying to extract from the graph?

    After Compline,
    Zaxo

      We're not sure yet what we want to extract. It's a bit of exploratory programming for a research group, and we won't really know what will turn out to be interesting until we've got a first draft implementation. We'd like to be able to do things like find the shortest path between two Web pages, find the longest shortest path between Web pages, find how many pages link to a page, how many links a page contains; standard graph theory sort of stuff, which is why we started with the Graph class.
Re: Working with a huge Graph object
by demerphq (Chancellor) on Dec 02, 2003 at 17:59 UTC

    As far as I know the Graph module is essentially a toy implementation intended for teaching Graph theory and programming to a novice programmers. Its fine and dandy for some simple experiements, and is a good venue for teaching complex concepts, but ultimately its just a toy. The ram requirements for a modest graph are prohibitive, and the speed of the implementation leaves much to be desired. Much better to find an alternate C implementation (or prolog) to do this type of stuff. A pure perl implementation just isnt a big enough hammer for the nut you want to crack.

    Incidentally if you do happen to put in the effort to put a perl wrapper on a classy set of graph tools then I for one would welcome it greatly. Or if its already been done, a heads up for what to look at would be nice. :-)

    HTH


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi


Re: Working with a huge Graph object
by etj (Deacon) on Jun 26, 2022 at 00:43 UTC
    I wouldn't be surprised if Berkeley DB (DB_File, in core Perl) would be a good way to explore this. It's disk-based, so has no memory limits.
Re: Working with a huge Graph object
by tzz (Monk) on Dec 03, 2003 at 16:55 UTC
    I've implemented fairly complex graphs with RDBMS, for instance MySQL. Class::DBI makes for a nice interface.

    The disadvantage is the lack of speed, of course. You could use a file-based RDBMS such as SQLite, or a pure file-based database without RDBMS features such as BerkeleyDB. Those may be adequate to your needs. If you are just storing nodes and edges without any complex attributes, you may be able to get away with BerkeleyDB.

    You can probably hack the Graph module to integrate it with a RDBMS system.

    Ted

      Yeah, we currently have the data in SQL, and are exploring moving some of it out because it's too slow. Although I was going to work on a pure-SQL single-source, shortest-path algorithm, which sounded masochistically fun!
Re: Working with a huge Graph object
by sth (Priest) on Dec 03, 2003 at 00:11 UTC

    Can PDL help?

    sth

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://311488]
Approved by Roger
Front-paged by bart
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2024-04-18 10:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found