Ever wanted to scrape a website for valuable data and then got disappointed when you suddenly realized that it is using Cloudflare? Well, there is no longer any need to be discouraged, as there is a practical solution to this problem, and it is called FlareSolverr.
With FlareSolverr you can easily bypass Cloudflare limits and get scraping access to websites that were once out of reach.
In this guide, you will learn how to install and use FlareSolverr for limitless data scraping. On top of that, you will find out how to use post requests and manage sessions for a more effective workflow. Read on to learn more.
What Is FlareSolverr?
FlareSolverr is a proxy server that allows you to scrape Cloudflare-protected websites and overcome their limitations. Here is how it works.
As soon as a request is received, FlareSolverr creates a Chrome-based web browser and opens your target URL with user parameters. Then, it waits until the Cloudflare challenge is solved. Finally, it returns the content and cookies so that you can use them to bypass Cloudflare with an HTTP client such as Python Request.
Tip: To take your scraping process even further, consider pairing FlareSolverr with a mobile proxy.
How to Install FlareSolverr
FlareSolverr supports Windows, Linux, and macOS and you can install it using its docker image, precompiled binaries, or source code. In this tutorial, you will learn how to install it using a Docker container. The main benefit of this approach is that the docker image comes with a preinstalled external browser that you need to make FlareSolverr work.
1. Install Software Dependencies
Before you install FlareSolverr, you should first consider its software dependencies as follows.
- Install Docker using its official guide for your operating system.
- Update libseccomp2 to version 2.5 or higher (Debian users only).
In case you are running Ubuntu with root privileges, the fastest and easiest way to install Docker is by using the following three commands.
apt-get install docker.io
systemctl enable docker
systemctl start docker
2. Install FlareSolverr Using Its Docker Image
Use the following command to install FlareSolverr from the docker image on Linux, Windows, or Mac. Remember to add “sudo” if you are using Linux without root privileges.
docker pull flaresolverr/flaresolverr
For more information, you can find the FlareSolverr docker image using the links below.
GitHub: https://github.com/orgs/FlareSolverr/packages/container/package/flaresolverr
DockerHub: https://hub.docker.com/r/flaresolverr/flaresolverr
3. Run FlareSolverr
You can now run FlareSolverr using the following command on Linux, Windows, or Mac. Remember to add “sudo” if you are using Linux without root privileges.
docker run -d \
--name=flaresolverr \
-p 8191:8191 \
-e LOG_LEVEL=info \
--restart unless-stopped \
ghcr.io/flaresolverr/flaresolverr:latest
4. Verify That FlareSolverr Is Working
Finally, test if FlareSolverr is working by opening the following URL in your browser.
If you see a response such as “FlareSolverr is ready!”, you can rest assured that you installed it successfully.
That is all. Now it is time to learn how to use this wonderful tool to get all the data you need from Cloudflare-protected websites.
How to Use FlareSolverr
As you will see in a later part of this section, there are two primary ways of using FlareSolverr to scrape website data. But before diving deep into these, you should first take a look at two simple examples of requests. The first one will rely on bash and curl, while the second one takes advantage of Python Requests.
Running a Curl Request
If you want to use FlareSolverr directly from bash, you can do this with the curl command. Take a look at the code snippet below for reference.
curl -L -X POST 'http://localhost:8191/v1' \
-H 'Content-Type: application/json' \
--data-raw '{
"cmd": "request.get",
"url":"http://www.website.com",
"maxTimeout": 60000
}'
Running a Python Request
On the other hand, you have the option of using Python Requests to do the same. Take a look at the code snippet below for reference.
import requests
url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}
data = {
"cmd": "request.get",
"url": "http://www.website.com",
"maxTimeout": 60000
}
response = requests.post(url, headers=headers, json=data)
print(response.text)
Output Example
If everything works as expected, your output should look similar to the one below.
{
"status": "ok",
"message": "Challenge solved!",
"solution": {
"url": "https://website.com",
"status": 200,
"cookies": [
{
"domain": "website.com",
"httpOnly": false,
"name": "2F_TT",
"path": "/",
"secure": true,
"value": "0"
},
],
"userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
"headers": {},
"response": "<html><head>...</head><body>...</body></html>"
},
}
Scraping Using Python Requests
You can configure FlareSolverr to retrieve valid Cloudflare cookies and use them with Python Requests. This is the most resource-efficient way to scrape Cloudflare websites. Take a look at the code snippet below for reference.
import requests
post_body = {
"cmd": "request.get",
"url":"https://website.com",
"maxTimeout": 60000
}
response = requests.post('http://localhost:8191/v1', headers={'Content-Type': 'application/json'}, json=post_body)
if response.status_code == 200:
json_response = response.json()
if json_response.get('status') == 'ok':
cookies = json_response['solution']['cookies']
clean_cookies_dict = {cookie['name']: cookie['value'] for cookie in cookies}
# Fetches cookies
user_agent = json_response['solution']['userAgent']
# Fetches user agent
headers={"User-Agent": user_agent}
# Creates a request
response = requests.get("https://website.com", headers=headers, cookies=clean_cookies_dict)
if response.status_code == 200:
print('Success')
Scraping Using a List of URLs
Alternatively, you can also use FlareSolverr with a list of page URLs to make it simpler. In this case, you will have to rely on its integrated HTTP client which will require more resources from your server or computer. Take a look at the code snippet below for reference.
import requests
url_list = [
'https://website1.com',
'https://website2.com',
'https://website3.com',
]
for url in url_list:
post_body = {
"cmd": "request.get",
"url": url,
"maxTimeout": 60000
}
response = requests.post('http://localhost:8191/v1', headers={'Content-Type': 'application/json'}, json=post_body)
if response.status_code == 200:
json_response = response.json()
if json_response.get('status') == 'ok':
html = json_response['solution']['response']
print('Success')
How to Manage Sessions
If you need to use Cloudflare cookies for a while, you can set up FlareSolverr sessions. By doing this, you will no longer need to repeatedly solve Cloudflare challenges or send cookies every time you make a request.
You can use FlareSolverr to create, list, and remove sessions with the commands listed below.
- sessions.create
- sessions.list
- sessions.destroy
Read on to learn how to use each one of them.
Creating a Session
To create a session, set the “cmd” setting to “session.create.” Take a look at the code snippet below for reference.
import requests
url = "http://localhost:8191/v1"
headers = {"Content-Type": "application/json"}
data = {
"cmd": "sessions.create",
"url": "https://website.com",
"maxTimeout": 60000
}
response = requests.post(url, headers=headers, json=data)
print(response.content)
If everything goes well, you should see an output containing “Session created successfully.”
Listing Sessions
If you want to check your active sessions, you can list them by their IDs. To do that, set the “cmd” setting to “sessions.list.” Take a look at the code snippet below for reference.
curl -L -X POST 'http://localhost:8191/v1' \
-H 'Content-Type: application/json' \
--data-raw '{
"cmd": "sessions.list",
"url":"http://website.com",
"maxTimeout": 60000
}'
Removing a Session
Now that you know your session ID, you can use it to remove the session. To remove a session, set the “cmd” setting to “session.destroy.” and set the “session” setting to the proper ID. Take a look at the code snippet below for reference.
curl -L -X POST 'http://localhost:8191/v1' \
-H 'Content-Type: application/json' \
--data-raw '{
"cmd": "sessions.destroy",
"session": "session_ID",
"url":"http://website.com",
"maxTimeout": 60000
}'
Now you know everything there is to know about managing FlareSolverr sessions.
How to Make POST Requests
If you need to retrieve Cloudflare cookies from POST endpoints, you can use FlareSolverr to make POST requests. To do this, you need to configure the cmd setting by replacing “request.get” with “request.post”. Take a look at the code snippet below for reference.
import requests
post_body = {
"cmd": "request.post",
"url":"https://www.website.com/POST",
"postData": POST_DATA,
"maxTimeout": 60000
}
response = requests.post('http://localhost:8191/v1', headers={'Content-Type': 'application/json'}, json=post_body)
print(response.json())
Remember to use a string with application/x-www-form-urlencoded (such as a=b&c=d) when setting “POST_DATA”.
Final Words
FlareSolverr is a very effective tool that you can use to bypass Cloudflare limits and scrape the data that you previously found to be inaccessible. Paired with a mobile proxy, FlareSolverr can take your scraping practice to a whole other level.
With this step-by-step guide, you have learned to install and run this software effectively. Furthermore, now you know how to use it in various ways, and even manage it more effectively with sessions.
By following these instructions, scraping Cloudflare-protected websites should be a breeze. Now it is time to test it out for yourself and enjoy the benefits of this amazing software.