If you’re into web scraping, we believe you’ve heard of Selenium – an open-source tool that helps automate web browser interactions for website testing and more.
This testing framework is especially useful when you need to interact with a browser to perform a number of tasks, such as clicking on buttons, scrolling, etc. But even if primarily Selenium is used for website testing, it can also be used for web scraping because it helps locate the required public data on a website.
So, in this article, let’s go through the Selenium integration process with Oxylabs’ Residential Proxies for a smooth web scraping process.
How to integrate Oxylabs’ proxies with Selenium?
Here, we explain how to integrate Oxylabs’ Residential Proxies with Selenium in Python. Note that the required version of Python is Python 3.5 (or higher).
Setting up Selenium
Firstly, you’ll need to install Selenium Wire to extend Selenium’s Python bindings because using the default Selenium module for implementing proxies that require authentication makes the whole process complicated. You can do it using the pip command:
pip install selenium-wire
Another recommended package for this integration is webdriver-manager. It’s a package that simplifies the management of binary drivers for different browsers. In this case, there’s no need to manually download a new version of a web driver after each update.
You can install the webdriver-manager using the pip command as well:
pip install webdriver-manager
Proxy authentication
Once everything is set up, you can move on to the next part – proxy authentication. For proxies to work, you’ll be prompted to specify your account credentials.
USERNAME = "your_username" PASSWORD = "your_password" ENDPOINT = "pr.oxylabs.io:7777"
You’ll need to adjust your_username
and your_password
fields with the username and password of your proxy user.
Testing proxy server connection
If you need to check if the proxy is working, you should visit ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you’re using.
try: driver.get("https://ip.oxylabs.io/") return f'\nYour IP is: {re.search(r"[0-9].{2,}", driver.page_source).group()}' finally: driver.quit()
Full code for Oxylabs’ Residential Proxies integration with Selenium
import re
from typing import Optional
from seleniumwire import webdriver
# A package to have a chromedriver always up-to-date.
from webdriver_manager.chrome import ChromeDriverManager
USERNAME = "your_username"
PASSWORD = "your_password"
ENDPOINT = "pr.oxylabs.io:7777"
def chrome_proxy(user: str, password: str, endpoint: str) -> dict:
wire_options = {
"proxy": {
"http": f"http://{user}:{password}@{endpoint}",
"https": f"http://{user}:{password}@{endpoint}",
}
}
return wire_options
def execute_driver():
options = webdriver.ChromeOptions()
options.headless = True
proxies = chrome_proxy(USERNAME, PASSWORD, ENDPOINT)
driver = webdriver.Chrome(
ChromeDriverManager().install(), options=options, seleniumwire_options=proxies
)
try:
driver.get("https://ip.oxylabs.io/")
return f'\nYour IP is: {re.search(r"[0-9].{2,}", driver.page_source).group()}'
finally:
driver.quit()
if __name__ == "__main__":
print(execute_driver())
Wrapping it up
All in all, Selenium is a great tool for public web scraping, especially when learning the basics. Plus, if you’re using high-quality and reliable proxies, public web scraping becomes even more efficient.
Got any questions about this post or generally about integrations with our proxies? You can reach out to us any time!
Top comments (2)
If you have any questions or something is unclear, leave a comment here and we will make sure to answer as quickly as possible! :)
I think it's a good integration