Sr. Data Engineer (6-9 yrs)
Responsibilities:
- Data masking: implement PII masking pipeline
- Builds & maintain synthetic data generator
- CI/CD pipeline integration with auto-seed data
- Validates generated data against the ground truth – referential integrity, volume accuracy, etc. data validation characteristics.
Technical skills:
- Python (Faker, factory-boy, custom generators) or Node.js
- Database management: SQL Server, PostgreSQL — insert at scale
- CI/CD pipeline integration (GitHub Actions, Azure DevOps)
- Data masking techniques: tokenization, hashing, format-preserving encryption
- API-based data seeding (REST / GraphQL endpoints)
- Performance: generating & loading millions of records efficiently