Back to Blog
Guides
Raluca PenciucApr 7, 20238 min read

Unlock the Power of Data: How to Scrape Booking.com for Valuable Information

Unlock the Power of Data: How to Scrape Booking.com for Valuable Information

Prerequisites

If you don’t already have your Node.js environment set up, just head over to their official website to download the latest version for your operating system. Then create a new directory and run the following command to initialize your project:

npm init -y

We will use TypeScript to write the code. The superset of JavaScript adds optional static typing and other features. It is useful for larger projects and can make it easier to catch mistakes early on. You need to add it to the project dev dependencies and initialize its configuration file:

npm install typescript -save-dev                                                                                                                                                      npx tsc -init

Just make sure that in the newly generated file tsconfig.json, the “outDir” property is set to “dist”, as we intend to separate the TypeScript code from the compiled one.

Lastly, the following command will add Puppeteer to our project dependencies:

npm install puppeteer

Puppeteer is a Node.js library that provides a high-level API for controlling a headless Chrome browser, which can be used for web scraping and automation task

Data location

For this tutorial we chose to scrape the properties available in Madeira Islands, Portugal: https://www.booking.com/searchresults.en-us.html?ss=Madeira+Islands&checkin=2023-01-13&checkout=2023-01-15. It’s important to add the check-in and check-out dates to the URL so all the property information will be available.

This guide covers the extraction of the following property data:

  • the name
  • the URL
  • the physical address
  • the price
  • the rating and the review count
  • the thumbnail

You can see them highlighted in the screenshot below:

Booking.com hotel search result card with highlighted property name, review score, and nightly price

By opening the Developer Tools on each of these elements you will be able to notice the CSS selectors that we will use to locate the HTML elements. If you’re fairly new to how CSS selectors work, feel free to reach out to this beginner guide.

Parsing the data

Since all listings have the same structure and data, we can extract all of the information for the entire properties list in our algorithm. After running the script, we can loop through all of the results and compile them into a single list.

After a first glance at the HTML document, you may have noticed that the Booking website is pretty complex and the class names are mostly randomly generated.

Booking.com hotel listing page with browser devtools highlighting HTML for the property title link and image

Luckily for us, the website is not relying solely on class names, and we can use the value of a specific attribute as an extraction criterion. In the screenshot above, we highlighted how accessible are the thumbnail, the name and the URL of a property.

import puppeteer from 'puppeteer';

async function scrapeBookingData(booking_url: string): Promise<void> {

    // Launch Puppeteer

    const browser = await puppeteer.launch({

        headless: false,

    	  args: ['--start-maximized'],

    	  defaultViewport: null

    })

    const page = await browser.newPage()

    // Navigate to the channel URL

    await page.goto(booking_url)

    // Extract listings name

    const listings_name = await page.evaluate(() => {

        const names = document.querySelectorAll('div[data-testid="title"]')

    	  const names_array = Array.from(names)

    	  return names ? names_array.map(n => n.textContent) : []

    })

    console.log(listings_name)

    // Extract listings location

    const listings_location = await page.evaluate(() => {

        const locations = document.querySelectorAll('a[data-testid="title-link"]')

    	  const locations_array = Array.from(locations)

    	  return locations ? locations_array.map(l => l.getAttribute('href')) : []

    })

    console.log(listings_location)

    // Extract listings thumbnail

    const listings_thumbnail = await page.evaluate(() => {

        const thumbnails = document.querySelectorAll('[data-testid="image"]')

    	  const thumbnails_array = Array.from(thumbnails)

    	  return thumbnails ? thumbnails_array.map(t => t.getAttribute('src')) : []

    })

    console.log(listings_thumbnail)

    await browser.close()

}

scrapeBookingData("https://www.booking.com/searchresults.en-us.html?ss=Madeira+Islands&checkin=2023-01-13&checkout=2023-01-15")

We used Puppeteer to open a browser instance, create a new page, navigate to our target URL, extract the mentioned data, and then close the browser. For visual debugging purposes, I am using the non-headless mode of the browser.

As explained above, the data was easily accessible thanks to the “data-testid” attribute that assigned a unique value to the HTML element. Run the following command to execute the script:

npx tsc && node dist/index.js

Your terminal should display 3 list results of the same size, representing the names, the URLs, and the thumbnails of all the properties on the current page.

Booking.com hotel listing page with browser devtools highlighting HTML for the property title link and image

For the next section of the HTML document, we highlighted the address, the rating and the review count for a property.

// Extract listings address

const listings_address = await page.evaluate(() => {

    const addresses = document.querySelectorAll('[data-testid="address"]')

    const addresses_array = Array.from(addresses)

    return addresses ? addresses_array.map(a => a.textContent) : []

})

console.log(listings_address)

// Extract listings rating and review count

const listings_rating = await page.evaluate(() => {

    const ratings = document.querySelectorAll('[data-testid="review-score"]')

    const ratings_array = Array.from(ratings)

    return ratings ? ratings_array.map(r => r.textContent) : []

})

console.log(listings_rating)

Like before, we made use of the “data-testid” attribute. Running the script again should show you 2 more lists, just like the previous ones.

Booking.com hotel listing page with browser devtools highlighting HTML for the price element

And finally, in the last section, we extracted the price of the property. The code will not be different from what we did before:

// Extract listings price

const listings_price = await page.evaluate(() => {

    const prices = document.querySelectorAll('[data-testid="price-and-discounted-price"]')

    const prices_array = Array.from(prices)

    return prices ? prices_array.map(p => p.textContent) : []

})

console.log(listings_price)

To make the extracted data easier to further process, we will combine the resulting lists in a single one.

// Group the lists

const listings = []

for (let i = 0; i < listings_name.length; i++) {

    listings.push({

        name: listings_name[i],

        url: listings_location[i],

        address: listings_address[i],

        price: listings_price[i],

        ratings: listings_rating[i],

        thumbnails: listings_thumbnail[i]

    })

}

console.log(listings)

The final result should now look like this:

[

  {

    name: 'Pestana Churchill Bay',

    url: 'https://www.booking.com/hotel/pt/pestana-churchill-bay.html?aid=304142&label=gen173nr-1FCAQoggJCFnNlYXJjaF9tYWRlaXJhIGlzbGFuZHNIMVgEaMABiAEBmAExuAEXyAEM2AEB6AEB-AEDiAIBqAIDuAK9luydBsACAdICJGViMWY2MmRjLWJhZmEtNGZhZC04MDAyLWQ4MmU3YjU5MTMwZtgCBeACAQ&ucfs=1&arphpl=1&checkin=2023-01-13&checkout=2023-01-15&group_adults=2&req_adults=2&no_rooms=1&group_children=0&req_children=0&hpos=1&hapos=1&sr_order=popularity&srpvid=42cc81de452009eb&srepoch=1673202494&all_sr_blocks=477957801_262227867_0_1_0&highlighted_blocks=477957801_262227867_0_1_0&matching_block_id=477957801_262227867_0_1_0&sr_pri_blocks=477957801_262227867_0_1_0__18480&tpi_r=2&from_sustainable_property_sr=1&from=searchresults#hotelTmpl',

    address: 'Câmara de Lobos',

    price: '911 lei',

    ratings: '9.0Wonderful 727 reviews',

    thumbnails: 'https://cf.bstatic.com/xdata/images/hotel/square200/202313893.webp?k=824dc3908c4bd3e80790ce011f763f10fd4064dcb5708607f020f2e7c92d130e&o=&s=1'

  },

  {

    name: 'Hotel Madeira',

    url: 'https://www.booking.com/hotel/pt/madeira-funchal.html?aid=304142&label=gen173nr-1FCAQoggJCFnNlYXJjaF9tYWRlaXJhIGlzbGFuZHNIMVgEaMABiAEBmAExuAEXyAEM2AEB6AEB-AEDiAIBqAIDuAK9luydBsACAdICJGViMWY2MmRjLWJhZmEtNGZhZC04MDAyLWQ4MmU3YjU5MTMwZtgCBeACAQ&ucfs=1&arphpl=1&checkin=2023-01-13&checkout=2023-01-15&group_adults=2&req_adults=2&no_rooms=1&group_children=0&req_children=0&hpos=2&hapos=2&sr_order=popularity&srpvid=42cc81de452009eb&srepoch=1673202494&all_sr_blocks=57095605_262941681_2_1_0&highlighted_blocks=57095605_262941681_2_1_0&matching_block_id=57095605_262941681_2_1_0&sr_pri_blocks=57095605_262941681_2_1_0__21200&tpi_r=2&from_sustainable_property_sr=1&from=searchresults#hotelTmpl',

    address: 'Se, Funchal',

    price: '1,045 lei',

    ratings: '8.3Very Good 647 reviews',

    thumbnails: 'https://cf.bstatic.com/xdata/images/hotel/square200/364430623.webp?k=8c1e510da2aad0fc9ff5731c3874e05b1c4cceec01a07ef7e9db944799771724&o=&s=1'

  },

  {

    name: 'Les Suites at The Cliff Bay - PortoBay',

    url: 'https://www.booking.com/hotel/pt/les-suites-at-the-cliff-bay.html?aid=304142&label=gen173nr-1FCAQoggJCFnNlYXJjaF9tYWRlaXJhIGlzbGFuZHNIMVgEaMABiAEBmAExuAEXyAEM2AEB6AEB-AEDiAIBqAIDuAK9luydBsACAdICJGViMWY2MmRjLWJhZmEtNGZhZC04MDAyLWQ4MmU3YjU5MTMwZtgCBeACAQ&ucfs=1&arphpl=1&checkin=2023-01-13&checkout=2023-01-15&group_adults=2&req_adults=2&no_rooms=1&group_children=0&req_children=0&hpos=3&hapos=3&sr_order=popularity&srpvid=42cc81de452009eb&srepoch=1673202494&all_sr_blocks=395012401_247460894_2_1_0&highlighted_blocks=395012401_247460894_2_1_0&matching_block_id=395012401_247460894_2_1_0&sr_pri_blocks=395012401_247460894_2_1_0__100000&tpi_r=2&from_sustainable_property_sr=1&from=searchresults#hotelTmpl',

    address: 'Sao Martinho, Funchal',

    price: '4,928 lei',

    ratings: '9.5Exceptional 119 reviews',

    thumbnails: 'https://cf.bstatic.com/xdata/images/hotel/square200/270120962.webp?k=68ded1031f5082597c48eb25c833ea7fcedc2ec2bc5d555adfcac98b232f9745&o=&s=1'

  }

]

Alternatives

Even though the tutorial until this point seemed straightforward, we must mention the caveats usually met in web scraping, especially in the case where you want to scale up your project.

Nowadays websites implement various bot detection techniques and collect browser data so they can prevent or block automated traffic. Booking.com makes no exception to this rule. Using the PerimeterX protection, the website performs checks on your IP and collects multiple info:

  • properties from the Navigator object (deviceMemory, languages, platform, userAgent, webdriver, etc.)
  • font and plugin enumeration
  • screen dimensions checks
  • and many more.

One solution to these challenges is to use a scraping API, which offers a simple and reliable way to access data from websites like Booking.com without the need to build and maintain your own scraper.

WebScrapingAPI is such a product, that utilizes proxy rotation to bypass CAPTCHAs and randomizes browser data to mimic a real user. To get started, simply register for an account and obtain your API key from the dashboard. This key is used to authenticate your requests.

Dashboard quickstart guide showing three steps: API access key, API Playground, and integration into your application

To quickly test the API with the already existing Node.js project, we can make use of its corresponding SDK. Simply run the following command:

npm install webscrapingapi

Now, all you need to do is adjust the previous CSS selectors to the API. The extraction rules feature allows you to parse data with minimal modifications, making it a powerful tool in your web scraping toolkit.

import webScrapingApiClient from 'webscrapingapi';

const client = new webScrapingApiClient("YOUR_API_KEY");

async function exampleUsage() {

    const api_params = {

        'render_js': 1,

    	  'proxy_type': 'datacenter',

    	  'timeout': 60000,

    	  'extract_rules': JSON.stringify({

            names: {

                selector: 'div[data-testid="title"]',

                output: 'text',

                all: '1'

        	},

        	locations: {

                selector: 'a[data-testid="title-link"]',

                output: '@href',

                all: '1'

        	},

        	addresses: {

                selector: '[data-testid="address"]',

                output: 'text',

                all: '1'

        	},

        	prices: {

                selector: '[data-testid="price-and-discounted-price"]',

                output: 'text',

                all: '1'

        	},

        	ratings: {

                selector: '[data-testid="review-score"]',

                output: 'text',

                all: '1'

        	},

        	thumbnails: {

                selector: '[data-testid="image"]',

                output: '@src',

                all: '1'

        	}

        })

    }

    const URL = "https://www.booking.com/searchresults.en-us.html?ss=Madeira+Islands&checkin=2023-01-13&checkout=2023-01-15"

    const response = await client.get(URL, api_params)

    if (response.success) {

        // Group the lists

    	  const listings = []

    	  for (let i = 0; i < response.response.data.names.length; i++) {

            listings.push({

               name: response.response.data.names[i],

               url: response.response.data.locations[i],

               address: response.response.data.addresses[i],

               price: response.response.data.prices[i],

               ratings: response.response.data.ratings[i],

               thumbnails: response.response.data.thumbnails[i]

            })

        }

        console.log(listings)

    } else {

        console.log(response.error.response.data)

    }

}

exampleUsage();

Conclusion

In this tutorial, we covered the basics of how to scrape Booking.com using Node.js and Puppeteer. We showed you how to set up your environment and extract listing details for Madeira, Portugal. However, these techniques and concepts can be applied to other websites and data points as well.

Web scraping can be an incredibly useful tool for businesses and data scientists alike. By gathering data from Booking.com, you can gain valuable insights into the hospitality industry, assess the competition, and more. However, it's important to keep in mind that web scraping may be against the terms of use for some websites, and it's always a good idea to check the specific policies before proceeding.

While it's possible to create your own web scraper, using a professional service can often be a safer and more efficient option, especially for larger projects. A professional scraper will have the expertise and resources to handle any challenges that may arise and deliver high-quality results.

We hope you enjoyed this tutorial and that you now feel equipped to gather valuable data from Booking.com using a Node.js environment. Thanks for reading!

About the Author
Raluca Penciuc, Full-Stack Developer @ WebScrapingAPI
Raluca PenciucFull-Stack Developer

Raluca Penciuc is a Full Stack Developer at WebScrapingAPI, building scrapers, improving evasions, and finding reliable ways to reduce detection across target websites.

Start Building

Ready to Scale Your Data Collection?

Join 2,000+ companies using WebScrapingAPI to extract web data at enterprise scale with zero infrastructure overhead.