I am looking to scrape pages from the web, and process, to put info into databases. I know that Perl does a great job with that. In fact I have used Perl for this purpose before. I want to stick with Perl's regular expression capabilities which I believe to be superior to Python's. However I was wondering if I should consider using both Perl and Python together for a wider set of tasks. I don't need to do complex machine learning while scraping the web. I just want to find comparable products that are for sale. That could be on Twitter, the web sites of small businesses, ebay, trade websites and Facebook. Basically, because of the range of data sources, I might need to use Perl and Python together. Does anyone here have experience with the sort of thing I want to do and do they agree that using both languages might be a good idea?

In reply to Web Scraping by betmatt

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.