How to scrape search-engine results?

Anonymous · March 10, 2025, 4:43pm

I tried making a web-scraper that enters google searches and collects the results. Of course, it wasn’t long before I got IP banned from google. I am wondering if other search engines that don’t log IP info like duckduckgo are similar?

Is there a search engine out there explicitly made for scraping? I saw something about one with PHP but I don’t know PHP unfortunately. (I am using python.)

Anonymous · March 10, 2025, 4:43pm

Use their API (free for 100 queries/day, $/1000 queries up to 10,000 per day).

Mrpixelmc · March 10, 2025, 4:43pm

Try following this guide, it might help you. Python is perfectly fine to web scrape, but you need to be careful while doing it.

Anonymous · March 10, 2025, 4:43pm

This all feels kind of roundabout and unreliable. Isn’t there just a sanctioned way to scrape browser search results? Like a browser made for that sort of thing?

Have you personally had success scraping google or duckduckgo search results? Which of the techniques mentioned in the article did you use?

I think I could pretty easily add random sleeps and mouse movements/clicks to my script, but I don’t really want to pay for a VPN to cycle my IP and I’m wonderining if using free proxies given by sites like the one described here are worth it.

Mrpixelmc · March 10, 2025, 4:43pm

Points 2, 3 and 7 are most important imo. You should definitely cycle IP and use different proxies, it’s probably best method, even if you get banned on an IP you can test others. Your aim should be impersonating a normal user.

Anonymous · March 10, 2025, 4:43pm

Do you know where I can figure out how to cycle IPs and use different proxies with selenium?

Mrpixelmc · March 10, 2025, 4:43pm

Quick Google search.

Anonymous · March 10, 2025, 4:43pm

I don’t understand why buying access to a single proxy server with its own individual IP would help, It seems like if that server’s IP gets blocked, then I am back to square 1 and out however much I paid for the service. And I definitely don’t have the money to buy access to dozens or more different proxy servers to cycle through all of them. I’m feel at a loss of what to do next here?

Looks like some services offer multiple servers and cycle through them automatically like Octoparse… do I need to use something like this to scrape simple search-engine results?