Job Description:
- Design and develop robust data pipelines, ETL processes, and data integration solutions to collect, transform, and load data from various sources into our data warehouse.
- Collaborate with cross-functional teams to identify data requirements and translate them into technical specifications and data models.
- Optimize and tune database systems, queries, and ETL processes for performance and scalability, ensuring efficient data retrieval and storage.
- Implement data quality and validation mechanisms to maintain data integrity and accuracy.
- Develop and maintain documentation for data pipelines, data models, and data flow diagrams to facilitate understanding and collaboration among team members.
- Monitor and troubleshoot data pipeline issues, database performance bottlenecks, and data-related problems to ensure smooth data operations.
- Stay up to date with emerging technologies and trends in the data engineering space, evaluating and recommending new tools and frameworks to improve data processing efficiency and overall system performance.
- Collaborate with data scientists and analysts to support their data needs, providing them with clean, reliable, and well-organized datasets.
- Create and maintain reports and dashboards using Power BI or other visualization tools to enable data-driven decision-making across the organization.
- Utilize Python programming to develop and maintain data engineering solutions, including data manipulation, data cleansing, and automation of data processes.
- Proven experience as a Data Engineer or similar role, with a strong understanding of data management principles and best practices.
- Proficiency in SQL and experience working with both SQL and NoSQL databases (e.g., MySQL, PostgreSQL, MongoDB, Cassandra).
- Experience with big data technologies, such as Hadoop, Spark, and related frameworks (e.g., Spark Streaming, PySpark).
- Familiarity with cloud-based data platforms and services, preferably AWS or Azure (e.g., Amazon Redshift, Azure SQL Database, S3).
- Strong programming skills in Python, with the ability to write efficient and optimized code for data manipulation and automation.
- Proficiency in Spark/Flink, Kafka/Pulsar.
- Good at AWS Glue or Azure Data Factory, ETL processes.
- Experience with data visualization tools like Power BI or Tableau, including creating interactive dashboards and reports.
- Familiarity with data warehousing concepts and experience working with tools like Apache Airflow or similar workflow management systems.
- Solid understanding of data security, privacy, and compliance standards.
- Strong analytical and problem-solving skills, with the ability to quickly troubleshoot and resolve data-related issues.
- Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
- Self-motivated and eager to learn, with a proactive approach to problem-solving and staying updated on industry trends and best practices.
Education:
BE OR BTech or MCA or MTech only
Interview Rounds:
2 or 3 Technical rounds
Experience Level:
4 to 10 Years