in reply to Parse HTML into multidimensional array

First off, parsing HTML using regexes is pretty much universally regarded as a bad idea. It can work, but it's usually easier to use one of the HTML modules. Especially if the HTML is likely to change in the future.

Having said that, get the full HTML that get returns, and look at your regexes. Your expected output suggests you just wanted the first data, but you have the g qualifier on them which means the entire input will be checked for matches. That might be why you are getting much more than you expected.

Dum Spiro Spero
  • Comment on Re: Parse HTML into multidimensional array

Replies are listed 'Best First'.
Re^2: Parse HTML into multidimensional array
by mazdajai (Novice) on Jul 14, 2015 at 21:38 UTC
    Looking at the source html, do you think it is impossible to pull the content into correct index? The purpose to achieve this without HTML::treebuilder so I can learn Perl without relying modules. I know it sounds silly but a lot time I won't understand the background using module.
      so I can learn Perl without relying [on] modules
      I would suggest that learning how to use HTML::TreeBuilder and in particular HTML::TreeBuilder::XPath would be a far more fruitful experience than entangling yourself in regular expressions. Learning powerful and robust modules will allow you to do more with Perl, not less.

      Impossible? Absolutely not, and easier with Perl than with any other language. But .. that's not saying much. I can sympathize with the desire to really learn Perl, but it will do you good to start to recognize that learning to use CPAN is part of learning to use Perl.

      If you're insistent on using regexes, start with the full html returned by your get command, and build the regex incrementally. A site like this can help you with that. And good luck! You can post questions here if you get stuck, but be prepared to hear "why aren't you using a module for this?" every time!

      Dum Spiro Spero