I have added a ; at the end of said regex and now have this: ^.+ytplayer\.config\s*=\s*(\{.*?};)

For this particular use-case the above regex extracts the JSON. Although JSON's decode_prefix() will ignore any trailing non-JSON (e.g. the Javascript I mentioned) content. Now, regarding the problem of unquoted keys and values. There is a allow_barekey() option to the JSON parser which will allow keys not to be quoted.

And you need to deal with the remaining problem of unquoted values. Unquoted values may be indicative of a much bigger problem: that values in the "JSON" (which is actually a Javascript hash) are function calls or other hash values, variables etc.! For example, this is the line that _get_args() looks for:

if(createPlayer){ if(window.ytplayer.bootstrapPlayerResponse){ window.ytplayer.config={args:{raw_player_response:window.ytplayer. +bootstrapPlayerResponse}}; ...

There is a reason why it is unquoted I think ...

So, yes the scrapper looks outdated (though very recently updated) and you are better off using something else.

bw, bliako


In reply to Re^2: youtube parser/scrabber by bliako
in thread youtube parser/scrabber by igoryonya

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.