Job Description
We are looking for a skilled and motivated Python Data Engineer with strong expertise in SQL, AWS, and Redshift to design, develop, and optimize scalable data processing solutions. The ideal candidate should also have exposure to AI/LLM data preparation techniques including text normalization, chunking, embedding generation, and vectorization.
The role involves working on cloud-based data platforms, large-scale data pipelines, and modern AI-enabled search/retrieval solutions.
Key Responsibilities
Design and develop scalable backend and data processing applications using Python.
Build and optimize complex SQL queries, ETL pipelines, and data transformation workflows.
Work with AWS services such as S3, Lambda, Glue, EC2, IAM, and Redshift.
Develop and maintain data warehousing solutions using Redshift.
Process structured and unstructured data for analytics and AI use cases.
Implement text preprocessing techniques including
Data normalization
Chunking
Embedding generation
Vectorization
Work with vector databases and semantic search concepts (good to have).
Collaborate with cross-functional teams including Data Engineers, AI/ML teams, and Business Analysts.
Ensure data quality, performance optimization, scalability, and security best practices.
Participate in code reviews, debugging, testing, and deployment activities.
Required Skills
Strong programming experience in Python.
Strong SQL knowledge including query optimization and data modeling.
Hands-on experience with AWS cloud services.
Experience working with Amazon Redshift.
Understanding of ETL/Data Pipeline development.
Experience with APIs and data integration.
Good analytical and problem-solving skills.
Good to Have Skills
Knowledge of Generative AI / LLM data preparation concepts.
Experience in
Text normalization
Chunking strategies
Embedding models
Vectorization techniques
Exposure to vector databases such as Pinecone, FAISS, ChromaDB, or Weaviate.
Familiarity with LangChain or Retrieval-Augmented Generation (RAG) concepts.
Knowledge of Docker, CI/CD, and Git.
Preferred Qualifications
Bachelor's/Master's degree in Computer Science, IT, Data Science, or related field.
Experience working in cloud-native data engineering environments.
Prior experience in AI-enabled analytics or search platforms is a plus.