Selenium Image Scraper (sis.py)
The Selenium Image Scraper (sis.py) is a Python script that helps you download images from various websites. To do this, it uses a Firefox browser controlled by Selenium to automate what a user might do manually, but no longer has to because the script will handle it for them. The logic within the script is that sis.py will find the largest image on the webpage, check to see if it's part of a lightbox and, if so, it will download the largest image it can by grabbing the image from the lightbox. Otherwise, it'll download the largest image on the page.
The script itself is designed to be run within a terminal in macOS but could be modified for other operating systems. Within macOS, images are downloaded to a folder in the current user's Downloads directory. There are some special considerations given to images in subreddits and directories will be created based on the name of the subreddit.
The script works pretty well on single image posts on Tumblr, but it's not always accurate given the chaotic nature that is Tumblr themes.
Setting Up
This information is for setting up on macOS. You may need to make some changes based on your operating system.
- Clone the repo into whatever directory you like.
- This script relies on a Firefox browser controlled by Selenium. So you'll want to have Firefox installed and available on your system.
- Selenium talks to Firefox through geckodriver, which can be acquired through Homebrew
brew install geckodriver - Within your working directory, create a virtual environment
python3 -m venv .venv - Activate the virtual environment
source .venv/bin/activate - Download any necessary dependencies through pip. This script uses os, sys, time, selenium, requests, and tqdm. Chances are you'll need to pip install selenium, requests, and tqdm; but your set up may vary.
- sis.py calls on a list of URLs within a creatively named text file called urls.txt. Add the urls of the web pages here. I've included a basic set of subreddit URLs in the repo if you want to have a quick play and see what happens.
- Run the script with
python3 sis.py
AI Disclosure
I used AI to help me create this script, specifically Claude using the Sonnet 4 model. I've tested the script for a few weeks before adding it to this repo, but be aware there could be dumbass bugs due to Claude being just as fallible as any other AI.