Python Mid/Senior Developer – Web Scraping & Automation

Actowiz Solutions

Ahmedabad, India

2-4 Years

Save

Posted 11 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About Us

Actowiz Solutions is a leading provider of data extraction, web scraping, and automation solutions. We empower businesses with actionable insights by delivering clean, structured, and scalable data through cutting-edge technology. Join our fast- growing team and lead projects that shape the future of data intelligence.

Role Overview

We are looking for a skilled Python Web Scraping Engineer with at least 2+ years of hands-on experience to join our technical team. The ideal candidate is not just familiar with Python scripting, but is an expert in the Scrapy framework.

You will be responsible for designing, deploying, and maintaining scalable crawling spiders that harvest data from various public sources. You will work within our existing Scrapy infrastructure to ensure high-volume data extraction is clean, reliable, and resistant to antibot measures.

Key Responsibilities

Design, build, and deploy high-performance spiders using the Scrapy Framework.
Handle JavaScript-heavy websites using headless browsers or middleware.
Implement anti-bot bypass strategies:
Proxy rotation
User-agent spoofing
CAPTCHA solving (manual or automated services)
Cookie/session management
Implement Item Pipelines to clean, validate, and structure scraped data before storage.
Implement strategies to bypass complex bot detection (Cloudflare, Akamai) using Scrapy plugins or by integrating headless browsers (Selenium/Playwright) only when necessary.
Manage efficient crawling rates to respect server loads while maximizing data throughput.
Maintain and optimize existing spiders to adapt to changes in target website DOM structures.
Parse and clean raw HTML/JSON/XML data and convert it into structured output formats (CSV, JSON, database entries).
Validate data for accuracy, completeness, and consistency.
Integrate with public/private APIs when available to reduce scraping complexity.
Handle REST APIs, pagination, rate limits, and authentication.
Monitor spider performance (error rates, latency) and debug issues using logging and debugging tools.
Ensure spiders are written for concurrency and scalability using Scrapy's asynchronous architecture.
Continuously monitor scraper performance and maintain stability when websites change.
Refactor and optimize scrapers for performance, speed, and reliability.
Monitor scraping jobs for bans, blocks, redirects, or unexpected errors.

Must-Have Skills

2+ years of professional experience in web scraping, crawling, or data extraction.
Strong hands-on experience with Scrapy (MANDATORY):
Creating spiders, middlewares, pipelines
Handling large-scale crawls
Using Scrapy settings, throttling, concurrency tuning
Strong Python skills (functions, OOP, debugging, modular coding).
Experience with anti-bot techniques, proxies, CAPTCHA solving, headers & cookies management.
Strong understanding of HTML, DOM, CSS selectors, XPath, and JavaScript-rendered content.
Familiarity with databases: MySQL, PostgreSQL, MongoDB, or similar.
Experience with JSON, REST APIs, and parsing structured/unstructured data.
Ability to write clean, well-documented, scalable code.

Good-to-Have Skills

Experience with Playwright, Selenium, or headless browsers.
Knowledge of asynchronous scraping (aiohttp, asyncio).
Basic DevOps knowledge (Linux, Docker, cloud deployments).
Exposure to social media or e-commerce scraping.
Understanding of big data tools or data warehousing.
Familiarity with containerization (Docker) for deploying scrapers.
Understanding of APIs and ability to reverse-engineer network calls to fetch data directly (bypassing HTML parsing).

You can also send your resume to: [Confidential Information]