How we scrape BlackHatWorld to automate our free trials

Like many sellers on BlackHatWorld (BHW), we have a free trial process so that potential customers can test our mobile proxy services before committing to a purchase. Free trials come in many different shapes and forms, but the idea behind them is the same from seller to seller. Before we built our Scraping API, we used to fulfill each trial manually by logging into BHW every day and reviewing any new trial requests on our marketplace threads. This was an arduous, error-prone process which ultimately meant we could not provide the best experience to potential new customers and give them a free trial as soon as possible, leading to us potentially losing out on new business.

In this post we explain how utlising our new Scraping API we were able to fully automate the process of giving free trials with ease to ensure that all our potential clients, who meet the right criteria are given a trial as quickly as possible.

What is the Scraping API?

In today's world, scraping the internet is not as simple as using cURL like it used to be. Many websites attempt to prevent you from scraping their content by utlising a range of different techniques such as rate limiting, IP checks, captchas, using anti-bot tools such as Cloudflare, and many, many more.

BHW for example utlises Cloudflare to check your browser before you can access their page, this involves challenging whether or not the device is real, controlled by a human, and has a good reputation. Using a tool like cURL in this instance would not work at all, so enter the GridPanel Scraping API.

Our API uses real devices, with real IP addresses and real browsers so that you can make one simple API request to scrape any webpage on the internet. You can read more about our Scraping API over here. Our goal here is to make it as simple as possible to scrape, by managing all the infrastructure you need, including Proxies, Rotation, Devices, Browser instances, Fingerprints, and more. This ultimately saves you a lot of time and effort and means you can get back to scraping the easy way like you would with cURL.

Setting up our thread for scraping

Typically sellers will ask for users to comment on their BHW marketplace threads to request a trial. The seller will then be able to judge whether this user should be given a trial based on their own internal criteria and then set up a communication channel with the BHW user to give them access to the trial they need. This is how we used to work at GridPanel, but we changed a number of things so that we can set ourselves up for automation.

Firstly, instead of asking a BHW user to comment something simple like "FREE TRIAL", we give them a token when they sign up for an account on our platform, which they can post on the thread instead. You can request your own token by going to gridpanel.net/bhw. This means they will eventually ask for a free trial by posting the following:

BlackHatWorld free trial request to aid scraping

As you can see in the above screenshot, the BHW user has commented with their unique token, surrounded by "GPBHW" and "BHWGP", we will come back to why this is important later on in the post.

This therefore gives us a unique way to link BHW users back to our own users in our internal systems, as well as stil requiring users to comment on our thread in order to receive a free trial, meaning our thread still gets bumped to the top of the marketplace forum on BHW.

Scraping the BlackHatWorld thread

Now we have users commenting on our thread correctly, which allows us to uniquely identify them on both BHW and in our internal systems, we can start scraping our thread.

To do so we utilise our Scraping API to fetch the HTML of the latest pages of our BHW marketplace thread. You will need to grab an API key from our dashboard here to follow along. The code to grab the HTML of a single page is as follows:

api_key = "YOUR-KEY"
page_number = 25
base_url = "https://www.blackhatworld.com/seo/uk-4g-5g-proxies-from-35-month-up-to-100mbps-unlimited-bandwidth-dedicated.1444703"

params = {
    'api_key': api_key,
    'url': f"{base_url}/page-{page_number}",
}
response = requests.get('https://gridpanel.net/api/scrape', params=params)

html_text = response.json()['result']['body']

The above code, whilst looking fairly simple, is very powerful. We are making a request to our marketplace thread, at page 25, using a real browser, with a real device IP address. It will then return to us the HTML of page 25 of our marketplace for which we can process.

In practice, we need to ensure that we scrape the latest page of the thread and walk backward page by page until we hit a point to stop. This point is up to you, but for us, it is based on the last trial we responded to.

Giving trials

Once we have scraped the HTML of each relevant BHW marketplace page, we can start processing the HTML to give trials. This is where our unique token along with the "GPBHW" prefix and "BHWGP" come into play. We use these along with a simple regular expression to find every single trial request within the HTML. The regular expression is as follows:

all_trial_tokens = re.findall(r'GPBHW-(.*)-BHWGP', self.soup.getText())

This regular expression will produce a list of all the trial request tokens so that we can correlate them with users in our systems and then judge if we want to provide a user with a free trial or not, based on our criteria. Also, note that we are using Beautiful Soup (a Python HTML parser) to get all the text from the webpage.

As a marketplace seller, you can easily replicate this process and use your own internal logic to link up BHW users with users in your systems to ensure that you give trials to the right people at the correct times.

Conclusion

In this post we have shown you how utilising our Scraping API, we very easily automated our BlackHatWorld free trial process. Using the API you as a marketplace seller can achieve the same results.

Do you want to give this a go yourself? You can get 1000 free credits for our Scraping API by signing up for a free account, with no credit card information required. If you are interested you can sign up and head over to our dashboard.