Jaharmy has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl Monks, please share some of your great wisdom!:)

I use Selenium::Remote::Driver and need to set up a "normal" user-agent. I've tried to use Selenium::UserAgent, but it has only some UserAgents for mobiles and tables, and none for usual desktop PC's. Maybe there's a way to expand the list of devices in Selenium::UserAgent, or how to set up a correct User-Agent (I'd like would be Firefox) with Selenium::Remote:Driver + Firefox manually?

I'm trying to parse a website which is protected from bots (solve the puzzle if you are a human). When I try to parse it with the default Selenium+Firefox UserAgent - the protection appears.

I've tried to use Selenium::UserAgent and it worked - the protection has disappeared, but I wasn't able to scrape the needed data, because the target site promotes its mobile application instead of showing the needed data this way.

So, after that I've checked the UserAgent of my home computer's browser and set it up using LWP::UserAgent:

my $ua = LWP::UserAgent->new( "Mozilla/5.0 (X11; Linux x86_64; rv:105. +0) Gecko/20100101 Firefox/105.0" ); my $driver = Selenium::Remote::Driver->new( browser_name => 'firefox', + ua => $ua );

But this way the protection arrived again.

After that, I've connected to my server through the VNC viewer, opened the same Firefox I've been using with Selenium, and there was no anti-bot protection this way. So, that's why I'm sure that I need to use a correct UserAgent and/or some other settings.

Please, help:)

Update:

Looks like I need to set up the Accept header, like:

$req->header('Accept' => '*/*');

But how to do it with Selenium::Remote::Driver ?

Update 2:

I've set up the Accept header with the code below, but nothing's changed.

$ua->default_header('Accept' => "*/*");
  • Comment on How to set up a correct custom User-Agent when using Perl's Selenium::Remote::Driver and Firefox
  • Select or Download Code

Replies are listed 'Best First'.
Re: How to set up a correct custom User-Agent when using Perl's Selenium::Remote::Driver and Firefox
by davies (Monsignor) on Dec 17, 2022 at 17:28 UTC

    I'm afraid it sounds as though you are trying to break the T's & C's of a web site to do something they don't want you to. As many monks manage such sites or have friends who do, you are unlikely to get much help until you convince us that everything is above board.

    Regards,

    John Davies

      I'm trying to scrape the hashtags from TikTok's trending videos. I've already got this working with Selenium::UserAgent, but it shows only 24 first videos. Besides, I've got it working on my other machine with Centos 7, but I've been installing all the software a couple of years ago, and don't remember how I did it before (installing the same old versions on the new machine doesn't work, there's something in the settings I think).

      So I think there is no much sense of stopping me from doing what I need to. Also, I can solve it in a more hardcore way - run a real browser, start a plugin which will allow to run JS in it, and parse everything with JS. And this way there is no protection which can hold it at all. Here is some wisdom from me for all who needs it lol:) But I hate JS and would like to learn Perl deeper. Isn't this is the place where I can get some help with Perl?

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: How to set up a correct custom User-Agent when using Perl's Selenium::Remote::Driver and Firefox
by marto (Cardinal) on Dec 18, 2022 at 11:23 UTC

    I don't use Selenium, I use WWW::Mechanize::Chrome and Mojo::UserAgent for web scraping stuff, but I appreciate you may already have a body of tests using Selenium.

    "Maybe there's a way to expand the list of devices in Selenium::UserAgent"

    See devices.json.

    "After that, I've connected to my server through the VNC viewer, opened the same Firefox I've been using with Selenium, and there was no anti-bot protection this way. So, that's why I'm sure that I need to use a correct UserAgent and/or some other settings."

    Some sites use advanced techniques to determine if you're running headless so your mileage may vary, and sometimes it's not consistent between requests. It can be quite the game of Whac-A-Mole.

    "I've set up the Accept header with the code below, but nothing's changed."

    At cursory glance I can't help any further help I'm afraid, a deeper dive into the code should prove fruitful.

    Update: fix link.