woompy has asked for the wisdom of the Perl Monks concerning the following question:

I’m no that good with regexp so I really need the help of someone who is. I’m trying to write a parser to parse the shockwave-flash embedded video code from sites like Google video, Yahoo, Youtube, Bolt and others. It needs to be parsed into a query string which is passed to a script which opens the players in a separate window. An example of how it will work with some pre parsed query strings can be found here - http://gigginonline.com/playvids.html The receiving script also uses swfobject.js - the docs can be found here - http://blog.deconcept.com/swfobject/ which gets by the activex warnings. The idea is that the parser can be added to the text parser of forums and blogs so users can add play lists. This is what I have so far.
#!/usr/bin/perl use CGI qw(:standard); use STRICT; print header(); # src='http://www.bolt.com/video/flv_player_branded.swf?contentId=1952 +754&contentType=2' # src from bolt code my $text = qq([video](title)- BOLT CODE - (/title)<embed loop='false' + quality='high' bgcolor='white' width='365' height='340' name='video_ +play_500' allowScriptAccess='sameDomain' type='application/x-shockwav +e-flash' pluginspage='http://www.macromedia.com/go/getflashplayer' /> +<br/><a href='http://www.bolt.com/'>Get video codes</a> at <a style=' +font-family:arial,sans-serif;font-size:12px;color:#0066CC' href='http +://www.bolt.com'>Bolt</a>[/video] [video](title)- YOUTUBE CODE - (/title)<object width="425" height="350 +"><param name="movie" value="http://www.youtube.com/v/topeBoB-ApQ"></ +param><param name="wmode" value="transparent"></param><embed src="htt +p://www.youtube.com/v/topeBoB-ApQ" type="application/x-shockwave-flas +h" wmode="transparent" height="350" width="425"></embed></object>[/v +ideo] [video](title)- YAHOO CODE - (/title)<embed src='http://us.i1.yimg.com +/cosmos.bcst.yahoo.com/player/media/swf/FLVVideoSolo.swf' flashvars=' +id=970784&emailUrl=http%3A%2F%2Fvideo.yahoo.com%2Futil%2Fmail%3Fei%3D +UTF-8%26vid%3De2e02ad6d9d1646cfa12ec8f270ae1ad.970784%26cache%3D1&imU +rl=http%25253A%25252F%25252Fvideo.yahoo.com%25252Fvideo%25252Fplay%25 +253F%252526ei%25253DUTF-8%252526vid%25253De2e02ad6d9d1646cfa12ec8f270 +ae1ad.970784%252526cache%25253D1&imTitle=Nobody%252527s%252BWatching% +252BOK%252BGo&searchUrl=http://video.yahoo.com/video/search?p=&profil +eUrl=http://video.yahoo.com/video/profile?yid=&creatorValue=bm9ib2R5c +3dhdGNoaW5ndHY%3D&vid=e2e02ad6d9d1646cfa12ec8f270ae1ad.970784' type=' +application/x-shockwave-flash' width='425' height='350'></embed>[/vid +eo]); while ($text =~ s{\[video\](.+?)\[\/video\]} { ($src,$fvars,$type,$width,$height) = (); $vdata = $1; $vdata =~ m!\(title\)(.+?)\(\/title\)!is; $title = $1; $vdata =~ s!("|'|#)!!isg; $vdata =~ m!\<embed (.+?)(\>|\/\>)!is; $video_data = $1; $video_data =~ s!("|'|#)!!isg; #clean it up if ($video_data =~ m|type=application\/x\-shockwave\-flash|i) { $type = "ok"; $video_data =~ s|type=application\/x\-shockwave\-flash||ig; # remove t +he Type } $video_data =~ m!width\s*=\s*([^ ]+)!i; $width = "$1"; $video_data =~ m|height\s*=\s*([^ ]+)|i; $height = "$1"; $video_data =~ m|src\s*=\s*([^ ]+)|ig; $src = "$1"; print "this is width of $title - $width\n"; print "this is height of $title - $height\n"; print "this is src of $title - $src\n"; if ($type&&$src&&$width&&$height) { $video_data =~ s/^\s+//; #remove leading spaces $video_data =~ s/\s+$//; #remove trailing spaces $video_data =~ s/\s+/ /g; #remove excess white spaces if ($video_data =~ m|flashvars\s*=\s*([^ ]+)|ig) { #match and save $fvars = "$1"; $video_data =~ s|flashvars\s*=\s*([^ ]+)||ig; #delete flashvars from +string } @values = split(' ',$video_data); $querystring = join('&', @values); if ($fvars) { $querystring .= "&" . $fvars; } qq|<!--videocode-->QUERTSTRING,TITLE,WIDTH and HEIGHT - ($title) $quer +ystring($title)(width - $width)(height - $height)(type - $type)(src - + $src)<--videocode-->|; } else { qq|<!--videocode-->PROBLEM WITH VIDEO CODE TITLED - $title<--videocode +-->|; } }eisg) {} print "This is the text output - $text";
I’m having a few problems with this, the first is that my error checking doesn't work. I’ve left the SRC out of the Bolt code to show you what its doing, or not doing. Second thing is I need to clean up and combine some of the back referencing if possible. I’m also having a problem with the Google code but I need to get what I have working before I can work on that. Can someone please help. Thanks Bob

Replies are listed 'Best First'.
Re: Help with shockwave-flash parser
by liverpole (Monsignor) on Nov 06, 2006 at 01:44 UTC
    Hi woompy,

    Am I correct that you're actually trying to execute this entire program as a regex evaluation?!?!  That's gotta set some kind of record for an obfuscated regex.

    Sorry that I know next to nothing about Shockwave flash (for some very tiny value of "next to").  But I can see that you've got an extra line in your script; take out the first right brace in:

    } }eisg) {}

    and then, at least, your program will compile and run.

    I would *strongly* urge you to reconsider writing the regex evaluation as a subroutine instead, otherwise you're not going to get alot of people wanting to jump in and tackle it.


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
      Ok updated the code. The funny thing is it still ran with the extra right curly bracket all it did was print it out. Carn't use HTML::parser as it will be added to the text parser of a forum. This is the output of the above code -
      this is width of - BOLT CODE - - 365 this is height of - BOLT CODE - - 340 this is src of - BOLT CODE - - 340 this is width of - YOUTUBE CODE - - 425 this is height of - YOUTUBE CODE - - 350 this is src of - YOUTUBE CODE - - http://www.youtube.com/v/topeBoB-Ap +Q this is width of - YAHOO CODE - - 425 this is height of - YAHOO CODE - - 350 this is src of - YAHOO CODE - - http://us.i1.yimg.com/cosmos.bcst.yah +oo.com/player/media/swf/FLVVideoSolo.swf This is the text output - <!--videocode-->QUERTSTRING,TITLE,WIDTH and +HEIGHT - (- BOLT CODE - ) loop=false&quality=high&bgcolor=white&width +=365&height=340&name=video_play_500&allowScriptAccess=sameDomain&plug +inspage=http://www.macromedia.com/go/getflashplayer(- BOLT CODE - )(w +idth - 365)(height - 340)(type - ok)(src - 340)<--videocode--> <!--videocode-->QUERTSTRING,TITLE,WIDTH and HEIGHT - (- YOUTUBE CODE - + ) src=http://www.youtube.com/v/topeBoB-ApQ&wmode=transparent&height= +350&width=425(- YOUTUBE CODE - )(width - 425)(height - 350)(type - ok +)(src - http://www.youtube.com/v/topeBoB-ApQ)<--videocode--> <!--videocode-->QUERTSTRING,TITLE,WIDTH and HEIGHT - (- YAHOO CODE - ) + src=http://us.i1.yimg.com/cosmos.bcst.yahoo.com/player/media/swf/FLV +VideoSolo.swf&width=425&height=350&id=970784&emailUrl=http%3A%2F%2Fvi +deo.yahoo.com%2Futil%2Fmail%3Fei%3DUTF-8%26vid%3De2e02ad6d9d1646cfa12 +ec8f270ae1ad.970784%26cache%3D1&imUrl=http%25253A%25252F%25252Fvideo. +yahoo.com%25252Fvideo%25252Fplay%25253F%252526ei%25253DUTF-8%252526vi +d%25253De2e02ad6d9d1646cfa12ec8f270ae1ad.970784%252526cache%25253D1&i +mTitle=Nobody%252527s%252BWatching%252BOK%252BGo&searchUrl=http://vid +eo.yahoo.com/video/search?p=&profileUrl=http://video.yahoo.com/video/ +profile?yid=&creatorValue=bm9ib2R5c3dhdGNoaW5ndHY%3D&vid=e2e02ad6d9d1 +646cfa12ec8f270ae1ad.970784(- YAHOO CODE - )(width - 425)(height - 35 +0)(type - ok)(src - http://us.i1.yimg.com/cosmos.bcst.yahoo.com/playe +r/media/swf/FLVVideoSolo.swf)<--videocode-->
      There is no SRC in the Bolt code but it won't produce an error as the src value is taken up with the height value. Sorry about the messy code, its a copy and paste from my open perl IDE.
Re: Help with shockwave-flash parser
by talexb (Chancellor) on Nov 06, 2006 at 01:37 UTC

    First of all, it seems that you're trying to parse HTML. While it seems simple, it's not a good thing to do -- there are too many different ways some random page will break your code.

    Instead, use something like HTML::Parser -- it's guaranteed to work better than pretty well anything you can come up with.

    Second, when you say that my error checking doesn't work this doesn't help us out much. We like to have a clear description of what tyou were doing, what went wrong, what you thought the right behaviour was supposed to be.

    Third, it seems you are in need of some good coding guidelines. Here are a few:

    • Indent your code.
    • Comment your code.
    • Use meaningful variable names.
    The recent book Perl Best Practices has some good ideas -- I don't agree with all of them, but it's a good place to start.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

    ps Welcome to Perlmonks.