Working Hours :
Full Time
Locations :
Hyderabad
Experience :
4 6 years
apply now
apply now
About The Role
Soothsayer Analytics is a global AI & Data Science consultancy headquartered in Detroit, with a thriving delivery center in Hyderabad. We design and deploy end-to-end custom Machine Learning & GenAI solutionsspanning predictive analytics, optimization, NLP, and enterprise-scale AI platformsthat help leading enterprises forecast, automate, and gain a competitive edge.
As a Data Engineer, you will build the foundation that powers these AI systemsscalable, secure, and high-performance data pipelines.
Job Overview
We seek a
Data Engineer (Mid-level) with 46 years of hands-on experience in designing, building, and optimizing data pipelines. You will work closely with AI/ML teams to ensure data availability, quality, and performance for analytics and GenAI use cases.
Key Responsibilities
Data Pipeline Development:
- Build and maintain scalable ETL/ELT pipelines for structured and unstructured data
- Ingest data from diverse sources (APIs, streaming, batch systems).
Data Modeling & Warehousing
- Design efficient data models to support analytics and AI workloads.
- Develop and optimize data warehouses/lakes using Redshift, BigQuery, Snowflake, or Delta Lake.
Big Data & Streaming
- Work with distributed systems like Apache Spark, Kafka, or Flink for real-time/large-scale data processing.
- Manage feature stores for ML pipelines
Collaboration & Best Practices
- Work closely with Data Scientists and ML Engineers to ensure high-quality training data.
- Implement data quality checks, observability, and governance frameworks.
Required Skills & Qualifications
Education:Bachelor's/Master's in Computer Science, Data Engineering, or related field.
Experience: 46 years in data engineering with expertise in:
- Programming: Python/Scala/Java (Python preferred).
- Big Data & Processing: Apache Spark, Kafka, Hadoop.
- Databases: SQL/NoSQL (Postgres, MongoDB, Cassandra).
- Data Warehousing: Snowflake, Redshift, BigQuery, or similar.
- Orchestration: Airflow, Luigi, or similar.
- Cloud Platforms: AWS, Azure, or GCP (data services).
- Version Control & CI/CD: Git, Jenkins, GitHub Actions.
- MLOps/GenAI pipelines: (feature engineering, embeddings, vector DBs)
Skills Matrix
Candidates must submit a detailed resume and fill out the following matrix:
Skill
Details
Skills Last Used
Experience (months)
Self-Rating (010)
Python
SQL / NoSQL
Apache Spark
Kafka
Data Warehousing (Snowflake, Redshift, etc.)
Orchestration (Airflow, Luigi, etc.)
Cloud (AWS / Azure / GCP)
Data Quality / Governance Tools
MLOps / LLMOps
GenAI Integration
Instructions For Candidates
- Provide a detailed resume highlighting end-to-end data engineering projects.
- Fill out the above skills matrix with accurate dates, duration, and self-ratings.