A while back, we were starting a project to redesign our websites. Before we could do that, though, we had to map out our current site to know what was there. Our site had been built up over the years into a Big Ball of Mud, so this was going to be a large project in itself.

I thought about this a little while and was intrigued with the possibility that Perl might be able to map the whole thing out for us. I spent my own time on the weekend coming up with a spider that would go across our domains, grabbing links, and dump the final data in YAML (I released this code to the public).

Next step was to write programs that would take that YAML file and do intresting things with it. The first thing was to check for broken links. It took about a half hour to write a program that would go through the YAML and get all requests that returned 404s. I sent the resulting list to our web master (which was about 7 pages long, IIRC), who had our web site 404-free by the end of the week.

One of the objectives for our redesign was to make no page more than 3 links deep from the home page. So my next program was to get the shortest path from page A to page B. This is essentially a problem of applied graph theory. Much of the available information on this topic is written for pathfinding in games, but the algorithms used there are applicable to general graphs (such as Dijkstra's algorithm and A*).

Although I've found many different uses for this, I've yet to use it for its orginal purpose of mapping our web site (we ended up making a simple map manually and going off that). I was hoping to take each page and make it into an SVG image with lines connecting linked pages. Exact layout could be done by an SVG graphics program. Alas, I never had the time to do this.

"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.


In reply to Re: Your Favorite Heroic Perl Story by hardburn
in thread Your Favorite Heroic Perl Story by friedo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.