drake50 has asked for the wisdom of the Perl Monks concerning the following question:

Is it possible to follow links on a web page done entirely in flash? The website I'm interested in is rosnet2000.com. They recently redid their site and it killed all of my scripts. Is it possible to script something in perl that will download info from this site?

Just for further clarification what I'm trying to download automagically is the race program for the day.

Replies are listed 'Best First'.
Re: Crawling Flash web pages
by revdiablo (Prior) on Dec 28, 2003 at 03:35 UTC

    Perhaps you can get something useful from it with SWF::Parse. I just found this with a CPAN search -- I have not used it, and cannot vouch for its quality, completeness, or even usefulness. Perhaps it will help you solve the problem.

    That said, this is a crying shame. This is a perfect example of why Flash is only good for unimportant frills. Even then, its value is questionable. I highly suggest a strongly worded email to these people, and if possible to look elsewhere to get this information. I know I would not reward this kind of behavior from a physical place with my patronage, and I wouldn't from an intangible place either.

      Try swftools, in particular swfextract. That should carry forward the linking at least.
      The download is quiss.org/swftools/.
      Otherwise in a pinch, you might capture the output of *nix 'strings' utility, and just look for the links to follow if intermixed as part binary and plain text output.
      Flash is a whole generation ahead of HTML. Try embedding a 4 way live voice chat or complicated moving graphics and sound into your web application. I love programming with Perl. Perl is gonna fade into obscurity for many web applications if it doesn't address it's deficiencies in state of the art web audio/video.
        Flash is not a replacement for Perl since it runs on the browser side while Perl is deployed on the server side. What we need are some good libraries for generating Flash pages in Perl. By the way an alternative to Flash could be the mozilla XUL engine where you could use Perl even on the browser side of things.

        A 4 way live voice chat with complicated moving graphics is something that Flash would be good for. I guess you might not consider these things "useless frills," but those are the kind of things I had in mind when I wrote that phrase. Maybe there is a place "in the future" for Flash, but for now, that place is not the same as information-based websites. Commercials, games, and other distractions in Flash are fine, but again we get back to what I consider "useless frills." I'm sure many people will disagree with that label, but it describes accurately how I feel about these things.

        Perhaps my head is stuck in the sand, but I believe text is here to stay. I like it a lot. And putting text in Flash just doesn't work. That's what I was complaining about, and that's something you completely ignored (or missed). I thought the original question was about exacting textual information from an entirely Flash-based website, so that's the context for my reply. I might have misread the question, but even if I did, I still stand by my statements (with regard to text-based information).

        Update: Also, I don't quite understand why you brought up the point that Perl will "fade into obscurity." Perhaps you were just making some random comments, but as part of your reply it makes no sense to me. But if you want me to reply to that too, then I disagree. I think Perl still is in a very good position for "ordinary" websites. Even if there eventually is a place for the revolutionary audio/visual Flash-based site, there will still be a place for the text-based site.

        Flash is not very helpful for:

        1. low-bandwidth connections (e.g. cell phone)
        2. blind or otherwise disabled people
        3. license-free development

        Ted

Re: Crawling Flash web pages
by chanio (Priest) on Dec 28, 2003 at 19:03 UTC
    I haven't yet dealed with Flash as a study. But saw some light in some articles that face Flash actionscripts with XML, since it happend to be compatible in some way. Add XML functionality to your Flash movies and another ...Create a dynamic image scroll for Flash 6+.

    These articles are about building with Flash and XML but, it might be a first start to find a solution for both sides. XML is becoming more standard than HTML...

    Ming allows to build Flash pages with perl.

Re: Crawling Flash web pages
by Arbogast (Monk) on Dec 28, 2003 at 06:15 UTC
    I don't have a good On Topic Perl answer. :(

    I haven't looked at the site thoroughly. The problem you might run into with Flash, is that it can be doing things more advanced than ordinary web pages. Many of the features are well beyond regular HTML.

    My guess would be that you would be better off getting a good SWF decompiler and then studying the code. Maybe I am overlooking something, but I think it would be very difficult to extract all the various getURLs, loadVars, streaming multimedia and such with Perl. A SWF decomplier should make it obvious what is going on under the hood.

    Once you sorted out the ActionScript, Perl could be a good way to download some or all of the content.
      I was really hoping I had missed an easy answer. I think we can expect to see more and more sites to push the boundries in order to make their sites looks as much like tv comercials as possible. Of course they'll just assume that only people will "need" to understand them.
Re: Crawling Flash web pages
by ant9000 (Monk) on Dec 29, 2003 at 09:17 UTC
    Well, disassembling Flash files is not exactly confortable, but it can be done. Anyway, depending on the kind of info you need, you might be able to avoid disassembling completely, at the price of some reverse web engineering.
    A Flash application usually communicates with the server using GET or POST (newer versions also handle XML data in the request, but communicate over HTTP anyway): thus, if the application receives the info you need as external data (which is very likely), you should be able to do the same.
    I've had a look at the site and I'm not sure of where the "race program for the day" is; but using a sniffer revealed that "Bet X top dogs" under "What's up" menu link is not inside the app, but comes as text data in answer to a POST request: "POST /rx_asp/rx_getdogdata.asp", with data payload "req=topten".
    So, no need to use Flash links for topten now:
    %curl -d "req=topten" http://www.rosnet2000.com/rx_asp/rx_getdogdata.a +sp
    yelds this result:
    topten=cruising,39773.55;face,38085.5;db,36317.38;Vikingsroc ,36116.7 +4;annie91,34131.46;gb,33068.55;rr,29465.4;kka,29121.1;i1,24721.9;DRAM +BUIE,20749.4
    which is really easy to parse...
    Good luck with your reverse engineering!
    Ant9000
      Thanks! This will give me somewhere to start my hacking:) I'll fiddle around with it until I get what I'm trying...