We are seeking a highly skilled Solutions Architect with 5+ years of experience in Data Engineering, specializing in AWS platforms, Python, and SQL. You will be responsible for designing and implementing scalable, cost-effective data pipelines on AWS, optimizing data storage, and supporting the ingestion and transformation of large datasets for business reporting and AI/ML exploration. This role requires strong functional understanding of client requirements, the ability to deliver optimized datasets, and adherence to security and compliance standards.
Roles & Responsibilities:
- Design and implement scalable, cost-effective data pipelines on the AWS platform using services like S3, Athena, Glue, RDS, etc.
- Manage and optimize data storage strategies for efficient retrieval and integration with other applications.
- Support the ingestion and transformation of large datasets for reporting and analytics.
- Develop and maintain automation scripts using Python to streamline data processing workflows.
- Integrate tools and frameworks like PySpark to optimize performance and resource utilization.
- Implement monitoring and error-handling mechanisms to ensure reliability and scalability of data solutions.
- Work closely with onsite leads and client teams to gather and understand functional requirements.
- Collaborate with business stakeholders and the Data Science team to provide optimized datasets suitable for Business Reporting and AI/ML Exploration.
- Document processes, provide regular updates, and ensure transparency in deliverables.
- Optimize AWS service utilization to maintain cost-efficiency while meeting performance requirements.
- Provide insights on data usage trends and support the development of reporting dashboards for cloud costs.
- Ensure secure handling of sensitive data with encryption (e.g., AES-256, TLS) and role-based access control using AWS IAM.
- Maintain compliance with organizational and industry regulations for data solutions.
Skills Required:
- Strong emphasis on AWS platforms.
- Hands-on expertise with AWS services such as S3, Glue, Athena, RDS, etc.
- Proficiency in Python for building Data Pipelines for ingesting data and integrating it across applications.
- Strong proficiency in SQL.
- Demonstrated ability to design and develop scalable Data Pipelines and Workflows.
- Strong problem-solving skills and the ability to troubleshoot complex data issues.
Preferred Skills:
- Experience with Big Data technologies, including Spark, Kafka, and Scala, for Distributed Data processing.
- Hands-on expertise in working with AWS Big Data services such as EMR, DynamoDB, Athena, Glue, and MSK (Managed Streaming for Kafka).
- Familiarity with on-premises Big Data platforms and tools for Data Processing and Streaming.
- Proficiency in scheduling data workflows using Apache Airflow or similar orchestration tools like One Automation, Control-M, etc.
- Strong understanding of DevOps practices, including CI/CD pipelines and Automation Tools.
- Prior experience in the Telecommunications Domain, with a focus on large-scale data systems and workflows.
- AWS certifications (e.g., Solutions Architect, Data Analytics Specialty) are a plus.
QUALIFICATION:
- Bachelor's or Master's degree in Computer Science, Data Engineering, or a related technical field, or equivalent practical experience.