About Us
Actowiz Solutions is a leading provider of data extraction, web scraping, and automation solutions. We empower businesses with actionable insights by delivering clean, structured, and scalable data through cutting-edge technology. Join our fast- growing team and lead projects that shape the future of data intelligence.
Role Overview
We are looking for a skilled Python Web Scraping Engineer with at least 2+ years of hands-on experience to join our technical team. The ideal candidate is not just familiar with Python scripting, but is an expert in the Scrapy framework.
You will be responsible for designing, deploying, and maintaining scalable crawling spiders that harvest data from various public sources. You will work within our existing Scrapy infrastructure to ensure high-volume data extraction is clean, reliable, and resistant to antibot measures.
Key Responsibilities
- Design, build, and deploy high-performance spiders using the Scrapy Framework.
- Handle JavaScript-heavy websites using headless browsers or middleware.
- Implement anti-bot bypass strategies:
- Proxy rotation
- User-agent spoofing
- CAPTCHA solving (manual or automated services)
- Cookie/session management
- Implement Item Pipelines to clean, validate, and structure scraped data before storage.
- Implement strategies to bypass complex bot detection (Cloudflare, Akamai) using Scrapy plugins or by integrating headless browsers (Selenium/Playwright) only when necessary.
- Manage efficient crawling rates to respect server loads while maximizing data throughput.
- Maintain and optimize existing spiders to adapt to changes in target website DOM structures.
- Parse and clean raw HTML/JSON/XML data and convert it into structured output formats (CSV, JSON, database entries).
- Validate data for accuracy, completeness, and consistency.
- Integrate with public/private APIs when available to reduce scraping complexity.
- Handle REST APIs, pagination, rate limits, and authentication.
- Monitor spider performance (error rates, latency) and debug issues using logging and debugging tools.
- Ensure spiders are written for concurrency and scalability using Scrapy's asynchronous architecture.
- Continuously monitor scraper performance and maintain stability when websites change.
- Refactor and optimize scrapers for performance, speed, and reliability.
- Monitor scraping jobs for bans, blocks, redirects, or unexpected errors.
Must-Have Skills
- 2+ years of professional experience in web scraping, crawling, or data extraction.
- Strong hands-on experience with Scrapy (MANDATORY):
- Creating spiders, middlewares, pipelines
- Handling large-scale crawls
- Using Scrapy settings, throttling, concurrency tuning
- Strong Python skills (functions, OOP, debugging, modular coding).
- Experience with anti-bot techniques, proxies, CAPTCHA solving, headers & cookies management.
- Strong understanding of HTML, DOM, CSS selectors, XPath, and JavaScript-rendered content.
- Familiarity with databases: MySQL, PostgreSQL, MongoDB, or similar.
- Experience with JSON, REST APIs, and parsing structured/unstructured data.
- Ability to write clean, well-documented, scalable code.
Good-to-Have Skills
- Experience with Playwright, Selenium, or headless browsers.
- Knowledge of asynchronous scraping (aiohttp, asyncio).
- Basic DevOps knowledge (Linux, Docker, cloud deployments).
- Exposure to social media or e-commerce scraping.
- Understanding of big data tools or data warehousing.
- Familiarity with containerization (Docker) for deploying scrapers.
- Understanding of APIs and ability to reverse-engineer network calls to fetch data directly (bypassing HTML parsing).
You can also send your resume to:
[Confidential Information]