Skip to content

πŸ§ͺ Scraping Dynamic Websites with Selenium in Python

Selenium is a powerful tool for automating web browsers. It's especially useful for scraping websites that rely heavily on JavaScript to load content dynamically β€” content that Beautiful Soup or Scrapy can't access directly.


πŸš€ Why Use Selenium?

  • Automates real browser interactions
  • Handles JavaScript-rendered content
  • Can fill forms, click buttons, scroll pages, etc.
  • Works with headless browsers for speed

πŸ› οΈ Installation

Install Selenium and a WebDriver (e.g., ChromeDriver):

pip install selenium

Install ChromeDriver:


βš™οΈ Basic Setup Example

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()  # Or use Firefox, Edge, etc.
driver.get("http://quotes.toscrape.com/js")

quotes = driver.find_elements(By.CLASS_NAME, "quote")

for quote in quotes:
    text = quote.find_element(By.CLASS_NAME, "text").text
    author = quote.find_element(By.CLASS_NAME, "author").text
    print(f"{text} β€” {author}")

driver.quit()

πŸ§‘β€πŸ’» Interacting with the Page

Selenium supports common browser actions:

Action Method Example
Click a button element.click()
Fill a form element.send_keys("text")
Submit a form element.submit()
Scroll down driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Wait for load Use WebDriverWait and expected_conditions

πŸ•ΆοΈ Headless Mode (No GUI)

To run Chrome invisibly (headless):

from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True

driver = webdriver.Chrome(options=options)

This is useful for running scripts on servers or CI/CD pipelines.


⏳ Handling Dynamic Loads with Waits

Use explicit waits to wait until elements appear:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "quote"))
)

πŸ“€ Exporting Data

Use standard Python (e.g., csv, json) to export data:

import csv

with open('quotes.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(["Text", "Author"])

    for quote in quotes:
        text = quote.find_element(By.CLASS_NAME, "text").text
        author = quote.find_element(By.CLASS_NAME, "author").text
        writer.writerow([text, author])

⚠️ Selenium vs. Beautiful Soup vs. Scrapy

Feature Beautiful Soup Scrapy Selenium
JavaScript support ❌ No ❌ No βœ… Yes
Speed & performance βœ… Fast βœ… Very fast ❌ Slower (real browser)
Browser simulation ❌ No ❌ No βœ… Yes
Interaction support ❌ No ❌ No βœ… Yes
Best use case Static pages Multi-page crawling Dynamic JS pages

🧾 Best Practices

  • Run in headless mode for performance
  • Use explicit waits instead of time.sleep()
  • Always close the browser with driver.quit()
  • Don’t scrape too fast β€” websites can detect automation
  • Use random delays and rotate user agents to avoid bans

πŸ” Ethical Considerations

  • Respect robots.txt and website terms of service
  • Don’t log in or extract private user data without permission
  • Avoid overwhelming servers with rapid scraping

πŸ”— Useful Resources


Selenium is ideal when you need to see and interact with the browser as a real user would β€” making it the best choice for scraping modern, JavaScript-heavy websites.

Happy browser automation! πŸ§ͺ