Adam has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a script to read a list of MS DevStudio .dsw files and generate a list of projects to compile. The list has to be ordered for dependancies which means that I need to read all the data before writing my list, and that I need to get several things for each project: dsw filename(and path), project name, dsp filename (and path), and dependancies. I wrote most of this with some success, using hashes to store my data and a recursive algorithm to resolve dependancies. But I didn't want to rebuild projects, so I had code in their to try and weed out repeats. The repeat weeding is also important when resolving dependancies. (Otherwise it will take a century to converge). Alas, some of the repeats have different ideas on what they are dependant on, meaning I need to get the dependancy list for every repeat when I'm reading the files. This script bombed for a variety of reasons, most of which are related to a poor data model for storing all that data.

Does anyone have a good way to store lots of data like that? (short of taking an object oriented approach... I'm trying to keep this simple and straight forward) Once I figure out a better way to represent everything I am confident that I can generate a better script to do this then the one I have now.

Replies are listed 'Best First'.
Re: Data Structs
by chromatic (Archbishop) on Jun 23, 2000 at 02:56 UTC
    Sounds like you need a tree data structure. I'd stick with the hash thing, and make a key called 'dependencies' that points to an anonymous hash. In that second hash, the key would be the dependency name, and the value would be a reference to the dependency hash.

    You'll have to figure out some way to add the references as you add dependencies. Keep another hash around with needed dependencies as keys, and as values, an anonymous array of references to objects that depend on them. (Is this starting to hurt your head yet?) When you add a new file, look to see if it can be added to those hashes.

    Once you've added all the files, iterate through the hashes, sorting them by least number of dependencies.

    You could also, in step two, record *which* dependencies a file fulfills, and build an inverted tree.

    I'd use objects in this case. It'd simplify things like linking.

RE: Data Structs
by Russ (Deacon) on Jun 23, 2000 at 03:33 UTC
    I am trying to work up some suggested code, but I don't know anything about M$ DevStudio.

    Could you (or someone) post an explanation of what a .dsw file represents, and what a .dsp file represents? What files have dependencies, and on what do they depend? (Does one .dsw file depend on another .dsw? Do .dsp files depend on other .dsp's, or on .dsw's?) Just a brief overview of some of the specifics...

    Russ

      To be clear, The same dsw file would then contain at least two other projects named "Win32Wrapper" and "PersistantData", but in no particular order.
      The regexs I used were:
          /^Project "(.+)=(.+) - Package/
          /Project_Dep_Name (.+)$/
      Where the first one yeilds $1 as the project name and $2 as the relative path from the dsw file to the dsp file. The second yeilds $1 as the name of the project on which this project depends.

      Some words of caution. My script ran across several dsw files (that was the point) and some dsw files share projects. Meaning that a project can be in multiple dsw files. But for some reason I have yet to figure out, different dependancies (if any at all) are listed for the same project in different workspace files. Weird. So one .dsw will claim a project to have a set of dependancies, while another won't know about any dependancies. I think this has to do with some workspaces being out of date (no one edited the workspace recently) but the code in it is still relevant.

      This runs on a win32 system (DevStudio) so I use a shell dir command to do the work of finding my .dsw files like this:

      $root = 'D:\projects\\'; @workspacefiles = split/\n/, `dir *.dsw /s /B`; s/\Q$root\E(.+)/$1/ for (@workspacefiles);
      The root is separated like that because I don't want it in the output that I generate. I figured it would be faster to remove it first, but it could just as easily be removed at the end. Taking it out early meant that I had to keep putting it back later. Oh well. And the path to the dsw file is easily gotten thusly:
      foreach $dswfile ( @workspacefiles ) { $pathtodsw = "$root$dswfile"; $pathtodsw =~ s/(.*)\\.+?\.dsw/$1/; # Then I open the file and read the relevant data. }
      Thanks.
      A .dsw file contains a list of projects which are "related". Each project, represented as a .dsp file which contains the locations of the source code and the compiler flags, is compiled as a library or executable or whatever. The .dsw file is nice in that it contains a list of dependancies each project has (but not so nice in that it lists them by project name instead of .dsp file)

      That said, here are the relevant parts of a typical project listing in a .dsw file and the regex's I was using:

      Microsoft Developer Studio Workspace File, Format Version 6.00
      # WARNING: DO NOT EDIT OR DELETE THIS WORKSPACE FILE!

      ###############################################################################

      Project: "Utilities"=.\Util.dsp - Package Owner=<4>

      Package=<5>
      {{{
          begin source code control
          "$/Shared/UTIL", QGFAAAAA
          .
          end source code control
      }}}

      Package=<4>
      {{{
          Begin Project Dependency
          Project_Dep_Name Win32Wrapper
          End Project Dependency
          Begin Project Dependency
          Project_Dep_Name PersistantData
          End Project Dependency
      }}}

      ###############################################################################