TL;DR Just give me the code
Here is all you need to scrape Pinterest search with GridPanel:
import urllib.parse
import requests
import json
# Setup our JavaScript code for our js_scenario
# This is the piece which extracts the URLs from the page.
js_code = """
function flattenObject(ob) {
var toReturn = {};
for (var i in ob) {
if (!ob.hasOwnProperty(i)) continue;
if ((typeof ob[i]) == 'object' && ob[i] !== null) {
var flatObject = flattenObject(ob[i]);
for (var x in flatObject) {
if (!flatObject.hasOwnProperty(x)) continue;
toReturn[i + '.' + x] = flatObject[x];
}
} else {
toReturn[i] = ob[i];
}
}
return toReturn;
}
var urls = [];
for (var i in window.document.links) {
var link = window.document.links[i];
try {
var reactProp = null;
var url = null;
for (var prop of Object.getOwnPropertyNames(link)) {
if (prop.indexOf("__reactProps") === 0) {
reactProp = link[prop];
}
}
if (reactProp) {
var allProperties = flattenObject(reactProp);
for (var prop of Object.getOwnPropertyNames(allProperties)) {
if (prop.indexOf(".link") >= 0) {
url = allProperties[prop];
}
}
}
if (url) {
urls.push(url);
}
} catch (e) {}
}
urls;
"""
# Setup our api key
api_key = <YOUR_KEY>
# Encode the search term we want to search for on Pinterest
search_term = "chocolate cake"
search_term = urllib.parse.quote(search_term)
# This is the URL we want to scrape
url = f"https://www.pinterest.co.uk/search/pins/?q={search_term}&rs=typed"
# Setup the params for the GridPanel API
params = {
'api_key': api_key,
'url': url,
'wait': 15 * 1000,
'js_scenario': json.dumps({"instructions": [
{'evaluate': js_code},
]})
}
# Scrape!
response = requests.get("https://gridpanel.net/api/scrape", params=params)
How to scrape Pinterest search with the GridPanel Scraping API
In this post, we are going to show you how to scrape search results from Pinterest using the easy-to-use GridPanel Scraping API. We will cover everything you need to know about scraping, the scraping API itself, how to set it up, how to use it correctly, and finally how to process the results. We are going to be using Python in our examples today, but nothing about this tutorial is Python-specific at all, it can be easily translated into your preferred language of choice.
What is scraping and why do we need an API?
Before we get into the details, let's cover what web scraping actually is, and why we would want to use the GridPanel Scraping API to help us scrape. Firstly, web scraping is a technique used to extract data from websites and web pages. This involves retrieving useful information from the HTML of a webpage to use to provide other products and/or services. Typically this is done in an automated fashion and is used to scrape a large amount of data from the target websites. This means that you need to stay undetected whilst scraping the target, as you run the risk of getting your scrapers blocked and therefore rendering your scraper useless.
There are a number of ways to stay undetected when scraping the web, but the GridPanel Scraping API packages up the best possible methods through an API and allows you to use it to scrape your target websites. We use real devices, with real device IP addresses so your requests to scrape sites are the real deal, there is no spoofing required. You can read more about our scraping API to see how it helps us scrape Pinterest.
What are we scraping on Pinterest?
This post is focused on scraping search results from Pinterest, in particular extracting the domains that the search results link to, however with the code we provide and our Scraping API you can really scrape anything you like from Pinterest with a few small changes. We will cover a few of those below.
To give an example, when a user searches Pinterest for "chocolate cake" they are taken to a page with the following URL:
https://www.pinterest.co.uk/search/pins/?q=chocolate+cake&rs=typed
This will result in the following page:
As you can see there are a number of search results, each result is a "pin" that links out to an external domain. Let's say we wanted to scrape all of the linked domains when searching for "chocolate cake", we would need to scrape each domain associated with each "pin". These domains look like the following for the first "pin":
How to scrape Pinterest search results
Now you know about scraping, and what we are aiming to scrape, we can move on to how to use the GridPanel Scraping API to achieve this.
Get your API key
Firstly you need to make sure that you set setup with the GridPanel Scraping API, you can do that by signing up for a free account. Once you have an account, you will be given 1000 credits and can access your API credentials here: gridpanel.net/dashboard/scraping. You will also find a handy request builder and information about your API usage.
The API request
Scraping the search results from Pinterest really is as simple as making a single API request through the GridPanel API. The following request will load the correct Pinterest search results page, ensure that the page and search results have properly loaded, and extract out the relevant external domains that we are attempting to scrape.
import urllib.parse
import requests
import json
# Setup our JavaScript code for our js_scenario
# This is the piece which extracts the URLs from the page.
js_code = """
function flattenObject(ob) {
var toReturn = {};
for (var i in ob) {
if (!ob.hasOwnProperty(i)) continue;
if ((typeof ob[i]) == 'object' && ob[i] !== null) {
var flatObject = flattenObject(ob[i]);
for (var x in flatObject) {
if (!flatObject.hasOwnProperty(x)) continue;
toReturn[i + '.' + x] = flatObject[x];
}
} else {
toReturn[i] = ob[i];
}
}
return toReturn;
}
var urls = [];
for (var i in window.document.links) {
var link = window.document.links[i];
try {
var reactProp = null;
var url = null;
for (var prop of Object.getOwnPropertyNames(link)) {
if (prop.indexOf("__reactProps") === 0) {
reactProp = link[prop];
}
}
if (reactProp) {
var allProperties = flattenObject(reactProp);
for (var prop of Object.getOwnPropertyNames(allProperties)) {
if (prop.indexOf(".link") >= 0) {
url = allProperties[prop];
}
}
}
if (url) {
urls.push(url);
}
} catch (e) {}
}
urls;
"""
# Setup our api key
api_key = <YOUR_KEY>
# Encode the search term we want to search for on Pinterest
search_term = "chocolate cake"
search_term = urllib.parse.quote(search_term)
# This is the URL we want to scrape
url = f"https://www.pinterest.co.uk/search/pins/?q={search_term}&rs=typed"
# Setup the params for the GridPanel API
params = {
'api_key': api_key,
'url': url,
'wait': 15 * 1000,
'js_scenario': json.dumps({"instructions": [
{'evaluate': js_code},
]})
}
# Scrape!
response = requests.get("https://gridpanel.net/api/scrape", params=params)
The real magic of this API call is the JavaScript code within the js_scenario parameter. Due to the fact that Pinterest is not populating the HTML of the page with the external links for each "pin", we need to read it from the __reactProps on the page itself. To achieve that we can utilise the js_scenario parameter to run some custom JavaScript on the page which will extract all the URLs from the __reactProps, super simple.
You can see how powerful the js_scenario parameter is, you can use it in your scrapers to achieve all manner of complex data extraction. For more information on what parameters are available check out our scraping API documentation.
Processing the results
If you run the above code you will receive an output that looks something like the following:
{
"headers": "...",
"body": "...",
"...": "...",
"evaluate_results": [
{
"result": {
"type": "object",
"value": [
"https: //www.creationsbykara.com/homemade-chocolate-cake-recipe/",
"https: //partypinching.com/best-chocolate-cake/",
"https: //theviewfromgreatisland.com/ina-gartens-chocolate-cake-recipe/",
"https: //butternutbakeryblog.com/moist-chocolate-cake/",
"https: //thestayathomechef.com/the-most-amazing-chocolate-cake/",
"https: //sallysbakingaddiction.com/triple-chocolate-layer-cake/",
"https: //www.lifeloveandsugar.com/best-chocolate-cake/",
"https: //www.southyourmouth.com/2019/03/the-best-chocolate-sheet-cake.html",
"https: //www.ihearteating.com/chocolate-mousse-cake/",
"https: //olivesnthyme.com/buttermilk-chocolate-cake/",
"https: //thecookinchicks.com/death-by-chocolate-dump-cake/",
"https: //easyweeknightrecipes.com/easy-chocolate-cake/",
"https: //www.lovebakesgoodcakes.com/death-by-chocolate-poke-cake/",
"https: //www.persnicketyplates.com/better-than-sex-cake/",
"https: //saltandbaker.com/chocolate-buttermilk-bundt-cake/",
"https: //bromabakery.com/best-blackout-chocolate-cake-ever/",
"https: //richanddelish.com/matildas-chocolate-cake/",
"https: //plowingthroughlife.com/the-best-rich-and-moist-chocolate-cake/",
"https: //honestandtruly.com/moist-chocolate-cake/",
"https: //www.persnicketyplates.com/williams-sonoma-sour-cream-chocolate-bundt-cake/?utm_term=best%20chocolate%20recipes&utm_campaign=8476587616",
"https: //newsronian.com/hersheys-chocolate-cake-with-cream-cheese-filling-chocolate-cream-cheese-buttercream/",
"https: //lovefoodies.com/nannys-chocolate-fudge-brownie-cake/",
"https: //theviewfromgreatisland.com/ina-gartens-chocolate-cake-recipe/",
"https: //www.southyourmouth.com/2019/03/the-best-chocolate-sheet-cake.html']",
]
}
}
]
}
This response contains all the context from the browser in which we scraped Pinterest in, as well as all the URLs that were scraped, embedded in teh evaluate_results key. You can extract that data as follows:
evaluate_results = response.json()["result"]["evaluate_results"]
urls = evaluate_results[0]["result"]["value"]
This will give you just the raw URLs as an array for processing!
From here, you can really do what you like, you could then crawl each domain in the list, again using the GridPanel Scraping API.
Extensions
You can now see how simple it is to not only scrape Pinterest search but to utilise the GridPanel Scraping API to do so. It would be trivial to extend this scraper to extract much more information about the "pins" themselves, for example extracting:
- Imagery
- Names
- Text
- Comments
- Tags
- Social media info
- Etc!
Conclusion
In this blog post, we've shown you how to utilse the GridPanel Scraping API to successfully scrape Pinterest search and extract all the external domains for a search term. If you are looking for more information on our API or on scraping other sites stay tuned as we have plenty more examples and scraping posts to come!
Sign up for an account and join our grid today to keep you ahead of your competitors.