Search by job, company or skills

V

Data Engineer

Fresher

This job is no longer accepting applications

  • Posted 23 hours ago

Job Description

We are looking for a skilled and motivated Python Data Engineer with strong expertise in SQL, AWS, and Redshift to design, develop, and optimize scalable data processing solutions. The ideal candidate should also have exposure to AI/LLM data preparation techniques including text normalization, chunking, embedding generation, and vectorization.

The role involves working on cloud-based data platforms, large-scale data pipelines, and modern AI-enabled search/retrieval solutions.

Key Responsibilities

Design and develop scalable backend and data processing applications using Python.

Build and optimize complex SQL queries, ETL pipelines, and data transformation workflows.

Work with AWS services such as S3, Lambda, Glue, EC2, IAM, and Redshift.

Develop and maintain data warehousing solutions using Redshift.

Process structured and unstructured data for analytics and AI use cases.

Implement text preprocessing techniques including

Data normalization

Chunking

Embedding generation

Vectorization

Work with vector databases and semantic search concepts (good to have).

Collaborate with cross-functional teams including Data Engineers, AI/ML teams, and Business Analysts.

Ensure data quality, performance optimization, scalability, and security best practices.

Participate in code reviews, debugging, testing, and deployment activities.

Required Skills

Strong programming experience in Python.

Strong SQL knowledge including query optimization and data modeling.

Hands-on experience with AWS cloud services.

Experience working with Amazon Redshift.

Understanding of ETL/Data Pipeline development.

Experience with APIs and data integration.

Good analytical and problem-solving skills.

Good to Have Skills

Knowledge of Generative AI / LLM data preparation concepts.

Experience in

Text normalization

Chunking strategies

Embedding models

Vectorization techniques

Exposure to vector databases such as Pinecone, FAISS, ChromaDB, or Weaviate.

Familiarity with LangChain or Retrieval-Augmented Generation (RAG) concepts.

Knowledge of Docker, CI/CD, and Git.

Preferred Qualifications

Bachelor's/Master's degree in Computer Science, IT, Data Science, or related field.

Experience working in cloud-native data engineering environments.

Prior experience in AI-enabled analytics or search platforms is a plus.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 148890691

Similar Jobs

Gurugram, India

Skills:

Apache AirflowSparkApache BeamPythonData Platform ManagementData Layer DesigndbtGoogle Cloud EcosystemGoogle Cloud ServicesData Pipeline Development

India

Skills:

Azure Data FactoryAzure Synapse AnalyticsPysparkSqlAzure SQL DBMicrosoft Fabric

Gurugram, India

Skills:

AlgorithmsData StructuresRdbms ConceptsPythonSql

Bengaluru, India

Skills:

snowflake S3GithubKafkaNosqlLambdaRDBMSTerraformPostgresOraclePythonJavaScalaEmrMssqlSqlRedisData SecurityAirflowCDC ingestion techniquesOpenSearchCloud PlatformsControl-MMSKDynamoAWS EcosystemGlueCI CD pipelinesAI-assisted development

Chennai, India

Skills:

BigQueryPysparkDataprocInformaticaSqlApache AirflowGitData GovernanceTalendPythondata modeling conceptsNoSQL databasesdata warehousing solutionsGoogle Data Engineering servicesdata quality standardsdata security practicesDevOps Concepts in GCP