I'm trying to fix a perl plugin for the Squeezebox server.

I need to split the following text into an id (from tag=) and the url (from url=).

These tag={id}&url={url} pairs are delimited by commas, but commas can also appear in the url, AND so can duplicate 'tag=' elements which must also stay so I cant split by that character alone.

an example of the text (all one line as it arrives)

itag=44&url=http://o-o---preferred---sn-u5a3u5a3-h5oe---v13---lscache3 +.c.youtube.com/videoplayback?upn=8kbZJLkF5PA&sparams=cp%2Cid%2Cip%2Ci +pbits%2Citag%2Cratebypass%2Csource%2Cupn%2Cexpire&fexp=927101%2C92300 +6%2C922401%2C920704%2C912806%2C913419%2C913546%2C913556%2C919349%2C91 +9351%2C925109%2C919003%2C920201%2C912706&key=yt1&expire=1348823962&it +ag=44&ipbits=8&sver=3&ratebypass=yes&mt=1348800611&ip=92.22.37.231&mv +=m&source=youtube&ms=au&cp=U0hTTVhNUV9LTENOM19QR1VKOkFyQWNVSVFNbmNL&i +d=1100a4b92b939cd6&type=video/webm;+codecs="vp8.0,+vorbis"&fallback_h +ost=tc.v13.cache3.c.youtube.com&sig=8353F6329CDA8168C4F7F29E20F2AE3F6 +509D85F.C582D63C02534232CE8E28D5ADC5B119AAEF2963&quality=large,itag=3 +5&url=http://o-o---preferred---sn-u5a3u5a3-h5oe---v11---lscache4.c.yo +utube.com/videoplayback?upn=8kbZJLkF5PA&sparams=algorithm%2Cburst%2Cc +p%2Cfactor%2Cid%2Cip%2Cipbits%2Citag%2Csource%2Cupn%2Cexpire&fexp=927 +101%2C923006%2C922401%2C920704%2C912806%2C913419%2C913546%2C913556%2C +919349%2C919351%2C925109%2C919003%2C920201%2C912706&expire=1348823962 +&algorithm=throttle-factor&burst=40&ip=92.22.37.231&itag=35&sver=3&ke +y=yt1&mt=1348800611&mv=m&source=youtube&ms=au&ipbits=8&factor=1.25&cp +=U0hTTVhNUV9LTENOM19QR1VKOkFyQWNVSVFNbmNL&id=1100a4b92b939cd6&type=vi +deo/x-flv&fallback_host=tc.v11.cache4.c.youtube.com&sig=885C9C098DF9D +80E780177E01CF944BC4F9564FE.9A374618A2BE8C2E562C8622DCB449A7071E37BD& +quality=large,itag= ...AND SO ON

I'd like the data to end up in a hash of id,url.

I first tried this, but it only splits the first found pair, and not properly.

for my $stream (split(/itag=(.*)&url=/, $streams)) { print $stream; }

I expected it to print out

44

http://o-o---preferred---sn-u5a3u5a3-h5oe---v13---lscache3.c.youtube.c +om/videoplayback?upn=8kbZJLkF5PA&sparams=cp%2Cid%2Cip%2Cipbits%2Citag +%2Cratebypass%2Csource%2Cupn%2Cexpire&fexp=927101%2C923006%2C922401%2 +C920704%2C912806%2C913419%2C913546%2C913556%2C919349%2C919351%2C92510 +9%2C919003%2C920201%2C912706&key=yt1&expire=1348823962&itag=44&ipbits +=8&sver=3&ratebypass=yes&mt=1348800611&ip=92.22.37.231&mv=m&source=yo +utube&ms=au&cp=U0hTTVhNUV9LTENOM19QR1VKOkFyQWNVSVFNbmNL&id=1100a4b92b +939cd6&type=video/webm;+codecs="vp8.0,+vorbis"&fallback_host=tc.v13.c +ache3.c.youtube.com&sig=8353F6329CDA8168C4F7F29E20F2AE3F6509D85F.C582 +D63C02534232CE8E28D5ADC5B119AAEF2963&quality=large

In reply to Split on regex, don't match partial regex by aldo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.