Web Crawling Analyst ( 2 to 10 yrs )

3-10 Years

Save

Early Applicant

Job Description

Web Crawling Analyst

Experience: 310 Years

Location: Remote (Work from Home)

Mode of Engagement: Full-time

No. of Positions: 8

Educational Qualification: Bachelor's degree in Computer Science, IT, Data Engineering, or related field

Industry: IT / Software Services / Data & AI

Notice Period: Immediate to 30 Days (Preferred)

What We Are Looking For

310 years of hands-on experience in web crawling and browser-based scraping, especially on JavaScript-heavy and protected websites.
Strong expertise with Playwright, Selenium, or Puppeteer for dynamic rendering and complex user flows.
Practical experience handling cookies, sessions, headers, local storage, and authentication workflows.
Proven ability to manage CAPTCHA challenges using third-party services or AI-based solvers.
Solid understanding of proxy rotation, IP management, user-agent and fingerprinting techniques to avoid detection and rate limits.
Capability to design scalable and resilient crawling pipelines with retries, logging, and monitoring.

Responsibilities

Design, develop, and maintain high-scale web crawling workflows for dynamic and protected websites.
Implement advanced browser automation solutions using Playwright / Selenium / Puppeteer.
Integrate CAPTCHA-solving services, proxy rotation mechanisms, and anti-detection strategies.
Build ETL-style data pipelines for extraction, validation, transformation, and storage.
Ensure data quality through error handling, retries, monitoring, and alerting.
Store structured data efficiently using SQL/NoSQL databases.
Collaborate with AI, data engineering, and product teams to deliver reliable crawling datasets.
Continuously improve crawling success rates, performance, and scalability.

Qualifications

Minimum 3 years of hands-on experience in Python-based web crawling and automation.
Strong working experience with Playwright, Selenium, Puppeteer, and browser automation.
Proficient in Python, including libraries such as Requests, BeautifulSoup, Scrapy, and async frameworks.
Hands-on experience with proxies, fingerprinting, session handling, and anti-bot mechanisms.
Good understanding of SQL / NoSQL databases for structured data storage.
Exposure to cloud platforms (AWS / GCP / Azure) is a plus.
Strong debugging, analytical, and problem-solving skills.