Job Description: Senior Data Integration & Pipeline Engineer
Experience: 510 years
Location: Remote
Employment Type: Full-Time
Position Overview
The Senior Data Integration & Pipeline Engineer is responsible for architecting, developing, and maintaining scalable, secure, and highly reliable enterprise data pipelines across diverse data stores and file systems. This role requires deep expertise in building custom connectors (SharePoint, PLM, SolidWorks), implementing ETL/ELT frameworks, handling large unstructured datasets through chunking and cataloguing, and ensuring efficient orchestration using tools such as Apache Airflow. The engineer will oversee storage management, operational reliability, and end-to-end data flow performance across enterprise environments.
Key Responsibilities
Connector Development
- Design, develop, and maintain robust connectors and integrations for:
- Microsoft SharePoint (online and on-prem)
- PLM systems (Teamcenter, Windchill, or equivalent)
- SolidWorks / SOLIDWORKS PDM
- Build secure, scalable APIs and ingestion frameworks to extract structured and unstructured data from disparate engineering and enterprise systems.
- Implement metadata extraction, incremental fetch logic, and schema harmonization across connectors.
Data Pipeline Engineering (ETL/ELT)
- Architect, implement, and optimize ETL/ELT pipelines for ingestion, transformation, cataloguing, and distribution of large-volume datasets.
- Apply chunking, partitioning, and parallelization strategies for large binary, CAD, or document repositories.
- Build and maintain enterprise catalogues enabling discoverability, lineage tracking, and auditability.
Enterprise Storage & Data Management
- Manage data storage layers across object stores, NAS/SAN, cloud storage (Azure, AWS, GCP), and hybrid environments.
- Implement performance tuning, lifecycle policies, tiering, and secure data handling.
- Ensure compliance with enterprise data governance, retention, and classification standards.
Orchestration & Automation
- Develop and maintain orchestration workflows using Apache Airflow or equivalent orchestration frameworks.
- Build DAGs, automate pipeline scheduling, implement dependency management, and handle pipeline-level exception and retry logic.
- Maintain CI/CD workflows for data pipelines and connector deployments.
Reliability Engineering & Operations
- Own the reliability, monitoring, and observability of all data pipelines.
- Implement automated alerting, health checks, and failure recovery mechanisms.
- Drive SLA/SLO compliance for ingestion and transformation pipelines.
- Troubleshoot performance bottlenecks, failures, and data quality issues.
Cross-Enterprise Data Support
- Work with all major enterprise data stores and file systems including relational databases, NoSQL stores, data lakes, object storage, and distributed filesystems.
- Collaborate with application, infrastructure, and data governance teams to ensure seamless integration and operational readiness.
- Provide technical leadership on data movement best practices, data modelling impacts, and architectural decisions.
Required Skills & Experience
- 510 years of experience in data engineering, integration engineering, or pipeline development in enterprise environments.
- Strong experience building connectors for SharePoint, PLM systems, and SolidWorks or engineering applications.
- Advanced expertise in ETL/ELT design patterns, data ingestion frameworks, and handling large unstructured datasets.
- Proven ability with chunking, partitioning, cataloguing, metadata management, and indexing strategies.
- Hands-on experience with orchestration platforms such as Apache Airflow (DAG design, scheduling, reliability tuning).
- Proficiency with Python, SQL, shell scripting, and API integration frameworks.
- Experience across enterprise data stores and file systems (RDBMS, NoSQL, object storage, file shares, distributed filesystems).
- Strong understanding of cloud platforms (Azure/AWS/GCP) and hybrid data architectures.
- Experience with DevOps for data (CI/CD pipelines, version control, containerization).
- Strong debugging, performance optimization, and reliability engineering capabilities.
Preferred Qualifications
- Experience with data cataloguing platforms (e.g., Collibra, Alation, custom catalogues).
- Familiarity with data lakehouse ecosystems (Databricks, Snowflake, Synapse).
- Background in engineering system integrations (CAD, PLM, ERP).
- Knowledge of data governance, security, and compliance frameworks.
- Certification in cloud data engineering (Azure DP-203, AWS Data Analytics, GCP Data Engineer).
Personal Attributes
- Strong analytical, troubleshooting, and problem-solving abilities.
- Ability to work autonomously while collaborating effectively with cross-functional teams.
- High attention to detail with a focus on reliability and long-term sustainability of systems.
- Excellent communication and documentation skills.