Scrapy + Selenium - How to crawl a Dynamic web page?

Subscribe to my newsletter and never miss my upcoming articles

Sometimes when we are crawling a web page, we might hit errors saying "xx element not found". This is because the web page is not fully loaded yet, but the crawler has already started to crawl the web content and is looking for the elements. Hence, one of the solutions is to define a web driver for the crawler by using Selenium to crawl such dynamic web page.

The steps below show how to setup Selenium Web Driver and implement it with Scrapy Spider.

  • Import Selenium Web Driver
# can be Chrome 
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
  • Initialize driver for the Spider class
 def __init__(self):
        options = Options()
        options.add_argument('--incognito')
        options.add_argument('--headless')
        options.add_argument('--disable-gpu')

        self.driver = webdriver.Firefox(firefox_options=options)

That's all of my sharing regarding the simple implementation of Selenium + Scrapy! :)

No Comments Yet