Web Crawling Analyst
Experience: 310 Years
Location: Remote (Work from Home)
Mode of Engagement: Full-time
No. of Positions: 8
Educational Qualification: Bachelor's degree in Computer Science, IT, Data Engineering, or related field
Industry: IT / Software Services / Data & AI
Notice Period: Immediate to 30 Days (Preferred)
What We Are Looking For
- 310 years of hands-on experience in web crawling and browser-based scraping, especially on JavaScript-heavy and protected websites.
- Strong expertise with Playwright, Selenium, or Puppeteer for dynamic rendering and complex user flows.
- Practical experience handling cookies, sessions, headers, local storage, and authentication workflows.
- Proven ability to manage CAPTCHA challenges using third-party services or AI-based solvers.
- Solid understanding of proxy rotation, IP management, user-agent and fingerprinting techniques to avoid detection and rate limits.
- Capability to design scalable and resilient crawling pipelines with retries, logging, and monitoring.
Responsibilities
- Design, develop, and maintain high-scale web crawling workflows for dynamic and protected websites.
- Implement advanced browser automation solutions using Playwright / Selenium / Puppeteer.
- Integrate CAPTCHA-solving services, proxy rotation mechanisms, and anti-detection strategies.
- Build ETL-style data pipelines for extraction, validation, transformation, and storage.
- Ensure data quality through error handling, retries, monitoring, and alerting.
- Store structured data efficiently using SQL/NoSQL databases.
- Collaborate with AI, data engineering, and product teams to deliver reliable crawling datasets.
- Continuously improve crawling success rates, performance, and scalability.
Qualifications
- Minimum 3 years of hands-on experience in Python-based web crawling and automation.
- Strong working experience with Playwright, Selenium, Puppeteer, and browser automation.
- Proficient in Python, including libraries such as Requests, BeautifulSoup, Scrapy, and async frameworks.
- Hands-on experience with proxies, fingerprinting, session handling, and anti-bot mechanisms.
- Good understanding of SQL / NoSQL databases for structured data storage.
- Exposure to cloud platforms (AWS / GCP / Azure) is a plus.
- Strong debugging, analytical, and problem-solving skills.