Big Data Developer

acs international india pvt. ltd. (acsii)

Pune, India

Fresher

Save

Posted a day ago
Be among the first 10 applicants

Early Applicant

Job Description

Position: Big Data Developer

Location: Pune

We are seeking a talented Big Data Developer to join our technology team, working within the chemistry domain to acquire, transform, and manage scientific content at scale. This role is critical in delivering data-driven solutions that power our flagship products and services. The successful candidate will work with millions of data records, building robust data pipelines and workflows that transform raw scientific content into meaningful insights.

About ACS-I India

ACS International India Pvt Ltd. (ACS-I India) is a wholly owned subsidiary of ACS International Ltd, USA and a part of the American Chemical Society. ACS-I India represent products and services provided by ACS divisions, including Chemical Abstracts Service (CAS) to the world's most important scientific companies, government organizations, global patent offices and academic institutions to promote research and discovery.

About CAS

Chemical Abstracts Service is a division of the American Chemical Society. It is a source of chemical information. The Company provides products and services, solutions for researchers and professional researchers, and support and training. CAS has provided the most comprehensive repository of research in chemistry and related sciences for over 100 years. The CAS finds, collects and organizes all publicly disclosed substance information and creates the world's most valuable chemistry databases. Scientist and patent professionals across the world rely on this database.

Job Responsibilities

Design, develop, and maintain scalable big data processing pipelines using Apache Spark and Scala/Java
Build and optimize data acquisition, curation, and transformation workflows for scientific content
Implement real-time and batch data processing solutions using Kafka for streaming data
Develop and maintain data transformation logic using XSLT for structured content management
Deploy and manage data pipelines on AWS cloud infrastructure, leveraging services like S3, Pure Storage.
Collaborate with the Tech Lead and cross-functional teams to understand requirements and deliver data solutions
Implement CI/CD pipelines using Jenkins for automated testing and deployment of data applications
Monitor, troubleshoot, and optimize data processing jobs for performance and reliability
Ensure data quality, governance, and compliance with established standards and best practices

Ideal Candidate Will Have

Apache Spark: Strong hands-on experience with distributed data processing, RDDs, Data Frames, and Spark SQL
Scala/Java: Proficiency in Scala and/or Java for building scalable data applications
AWS: Experience with AWS services including S3,Pure Storage (On-prem)
Apache Kafka: Knowledge of Kafka for building real-time streaming data pipelines and event-driven architectures
Jenkins: Experience with Jenkins for CI/CD automation, build pipelines, and deployment orchestration
XSLT: Working knowledge of XSLT for XML transformations and data mapping

Preferred Skills

Experience working with scientific or chemistry domain data
Knowledge of data governance frameworks and data quality management
Familiarity with containerization technologies (Docker, Kubernetes)
Understanding of distributed systems, microservices architecture, and RESTful APIs