Qcentrio is seeking a highly skilled and experienced Big Data Engineer with 5+ years of relevant experience in developing data and analytic solutions. The ideal candidate will have strong expertise in Python, SQL, Spark/PySpark, and AWS Cloud. You will play a crucial role in designing, implementing, and optimizing scalable, efficient, and robust data pipelines and solutions, participating actively in all phases of the software development lifecycle within an Agile environment.
Key Responsibilities
- Software Development Lifecycle: Actively participate in all phases of the software development lifecycle, including requirements gathering, functional and technical design, development, testing, roll-out, and support.
- Problem Solving: Solve complex business problems by utilizing a disciplined development methodology.
- Solution Development: Produce scalable, flexible, efficient, and supportable data solutions using appropriate technologies.
- Data Analysis & Mapping: Analyze source and target system data and map transformations that meet business requirements.
- Client Interaction: Interact effectively with clients and onsite coordinators during different phases of a project.
- Feature Design & Implementation: Design and implement product features in collaboration with business and Technology stakeholders.
- Data Quality & Optimization: Anticipate, identify, and solve issues concerning data management to improve data quality. Clean, prepare, and optimize data at scale for ingestion and consumption.
- Data Architecture Support: Support the implementation of new data management projects and re-structure the current data architecture as needed.
- Automated Workflows: Implement automated workflows and routines using workflow scheduling tools like Airflow.
- DevOps Practices: Understand and use continuous integration, test-driven development, and production deployment frameworks.
- Code & Design Review: Participate in design, code, test plans, and dataset implementation reviews performed by other data engineers, in support of maintaining data engineering standards.
- Data Profiling & Troubleshooting: Analyze and profile data for the purpose of designing scalable solutions. Troubleshoot straightforward data issues and perform root cause analysis to proactively resolve product issues.
Required Skills
- Experience: 5+ years of relevant experience developing Data and analytic solutions.
- Big Data Technologies: Experience building data lake solutions leveraging one or more of the following: AWS (EMR, S3, Databricks), Hive & PySpark.
- Programming & Scripting: Experience with scripting languages such as Python.
- Databases: Experience with relational SQL.
- Source Control: Experience with source control tools such as GitHub and related development processes.
- Workflow Orchestration: Experience with workflow scheduling tools such as Airflow.
- Cloud Expertise: In-depth knowledge of AWS Cloud (S3, EMR, Databricks).
- Problem-Solving: Has a strong problem-solving and analytical mindset.
- Data Pipeline: Working experience in the design, development, and testing of data pipelines.
- Agile Experience: Experience working with Agile Teams.
- Communication: Able to influence and communicate effectively, both verbally and in writing, with team members and business stakeholders.
- Adaptability: Able to quickly pick up new programming languages, technologies, and frameworks.
- Education: Bachelor's degree in Computer Science.
- Passion: Has a passion for data solutions.