Search by job, company or skills

Wissen Technology

Web Scraping / Data Acquisition Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Wissen Technology is hiring for Web Scraping / Data Acquisition Engineer


About Wissen Technology:

At Wissen Technology, we deliver niche, custom-built products that solve complex business challenges across industries worldwide. Founded in 2015, our core philosophy is built around a strong product engineering mindset—ensuring every solution is architected and delivered right the first time. Today, Wissen Technology has a global footprint with 2000+ employees across offices in the US, UK, UAE, India, and Australia. Our commitment to excellence translates into delivering 2X impact compared to traditional service providers. How do we achieve this Through a combination of deep domain knowledge, cutting-edge technology expertise, and a relentless focus on quality. We don't just meet expectations—we exceed them by ensuring faster time-to-market, reduced rework, and greater alignment with client objectives. We have a proven track record of building mission-critical systems across industries, including financial services, healthcare, retail, manufacturing, and more. Wissen stands apart through its unique delivery models. Our outcome-based projects ensure predictable costs and timelines, while our agile pods provide clients with the flexibility to adapt to their evolving business needs. Wissen leverages its thought leadership and technology prowess to drive superior business outcomes. Our success is powered by top-tier talent. Our mission is clear: to be the partner of choice for building world-class custom products that deliver exceptional impact—the first time, every time.

Job Summary:We are looking for a skilled Web Scraping / Data Acquisition Engineer with 3–7 years of experience to build robust data extraction pipelines for collecting legal data from public websites. The role involves designing crawlers to extract court judgments, tribunal orders, and regulatory decisions, storing structured metadata, and automating monitoring for new content. The ideal candidate has strong Python skills, hands-on web scraping experience, and the ability to handle large volumes of documents and structured data.

Experience: 3- 7 Years

Location:Mumbai

Mode of Work: Hybrid


Key Responsibilities:

  • Design and develop web crawlers to extract data from public websites.
  • Crawl listing pages and extract case metadata (case title, number, court, date, etc.).
  • Download judgments and maintain structured PDF/document storage.
  • Build automated pipelines to monitor websites and detect new judgments.
  • Extract structured data from documents and HTML pages.
  • Store data in structured formats suitable for downstream processing or search.
  • Handle pagination, anti-bot measures, and data cleaning workflows.
  • Maintain scrapers for reliability, accuracy, and long-term scalability.

Required Skills and Qualification


  • Strong hands-on experience with Python.
  • Proven experience in web scraping and crawler development.
  • Proficiency with browser automation tools: Playwright, Scrapy, or equivalent.
  • Experience with PDF extraction tools (pdfplumber, PyMuPDF, Apache Tika, etc.).
  • Strong understanding of HTML parsing, pagination handling, and automated file downloads.
  • Knowledge of anti-bot techniques (rate limiting, proxy handling, session rotation).
  • Experience processing structured and semi-structured documents.

Good to have Skills


  • Experience with large-scale crawlers or distributed scraping.
  • Working experience with document datasets and text-heavy systems.
  • Familiarity with Apache Tika / advanced PDF extraction.
  • Experience with AWS S3 for storing large volumes of raw documents.
  • Exposure to Elasticsearch or search indexing systems.
  • Experience with Kafka / AWS MSK for event-driven pipelines.
  • Background in legal, regulatory, or compliance datasets (optional).


Wissen Sites:


Website: www.wissen.com

LinkedIn: https://www.linkedin.com/company/wissen-technology

Wissen Leadership: https://www.wissen.com/company/leadership-team/

Wissen Live: https://www.linkedin.com/company/wissen-technology/posts/feedView=All

Wissen Thought Leadership: https://www.wissen.com/articles/

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 147163905