【Tech Insight】Scraping Car Model Images from Used Car Sites with Python【Development Log #3】

A futuristic scene of three Japanese cars—Prius, Fit, and Note—driving through a smart city, with glowing circuit diagrams and the word “AI” in the background. This symbolizes the launch of our AI-powered car model recognition project.

🔍 Why Collect Images from Used Car Sites?

To train an AI to accurately identify car models, a large and diverse set of real-world images is essential. That’s why I focused on used car listing websites.

Reasons:

Numerous real-world photos are available per car model
Variety in angle, background, and lighting conditions
Easier to treat as labeled data than generic image searches

Thanks to this approach, I was able to gather practical, realistic data for training purposes.

⚙️ Technologies and Setup

Language: Python 3.x
Scraping Tools: Selenium + webdriver-manager
Browser Automation: Chrome (headless mode)
Image Downloading: urllib.request
Preprocessing: File size-based filtering of junk images

🧠 Centralized Car List Management

I managed car names and search keywords in both Japanese and English like this:

car_list = [
    {"jp_name": "トヨタ プリウス", "en_name": "Toyota Prius", "keyword": "トヨタ プリウス site:example.com"},
    {"jp_name": "ホンダ フィット", "en_name": "Honda Fit", "keyword": "ホンダ フィット site:example.com"},
    ...
]

※ Replace “example.com” with the actual domain of the used car site you’re scraping.

📸 Image Collection Workflow

Access the target used car site
Search for each car model and scroll through results
Extract image URLs
Download images with Python
Filter out small (e.g. banner) images

Example:

if os.path.getsize(filepath) < 5000: os.remove(filepath)

🧼 Filtering Out Unusable Images

In addition to filtering by file size, I also used a separate script to remove:

Images without cars
Blurry or irrelevant images

This ensured the dataset remained clean and useful for training.

💡 Optimization Highlights

Adjustable max_images setting
Looped pagination to collect more samples
Random wait times to reduce bot detection risk

📂 What’s Next?

Integrate YOLO for automatic car body cropping
Automate dataset split into train/val/test for PyTorch
Expand into an app that shows car model + new & used price

📝 Final Thoughts

Used car websites are a powerful source of high-quality training data. With this scraping script, I was able to collect relevant images efficiently and reliably. In the next post, I’ll explain how I trained a ResNet model to classify these car models with high accuracy.

【Tech Insight】Scraping Car Model Images from Used Car Sites with Python【Development Log #3】

🔍 Why Collect Images from Used Car Sites?

⚙️ Technologies and Setup

🧠 Centralized Car List Management

📸 Image Collection Workflow

🧼 Filtering Out Unusable Images

💡 Optimization Highlights

📂 What’s Next?

📝 Final Thoughts

Recent Posts

Recent Comments

Archives

Categories

🔍 Why Collect Images from Used Car Sites?

⚙️ Technologies and Setup

🧠 Centralized Car List Management

📸 Image Collection Workflow

🧼 Filtering Out Unusable Images

💡 Optimization Highlights

📂 What’s Next?

📝 Final Thoughts

How We're Building KurumaChecker – An AI App That Detects Car Models and Prices [Dev Log #2]

🚗 AI Car Model Identifier with Used Price Estimation – Built with Gradio & Hugging Face Spaces

Just Point Your Camera! Roadmap for Developing an AI App That Instantly Detects Car Model and Price［Dev Log #1］

Recent Posts

Recent Comments

Archives

Categories