Principal SDET (Leadership Role )
Our Client is a leading retail technology company, empowering 1,200+ customers and over 500,000 users through its integrated commerce and retail solutions platform. Their comprehensive suite includes ERP, Cloud POS, Order Management, GST Compliance, and Business Intelligence solutions, helping retailers streamline operations and accelerate growth.
Driven by innovation, Our Client is evolving from being a System of Record to a System of Intelligence, leveraging advanced technologies and data-driven insights to redefine the future of retail.
Role Summary
You do not manage people. You manage quality standards. 16 QA engineers are embedded in different delivery teams — each reporting to their team's Tech Lead. Your job is to make sure all 16 are pulling in the same direction, using consistent tooling, and catching the classes of bugs that kill platforms at scale. This is a hands-on role. You will write test frameworks, not just review them.
Key Responsibilities
Quality Architecture & Strategy
- Define the test pyramid for the new platform: unit, integration, contract, E2E, performance — and the expected ratio for each context.
- Own the E2E test infrastructure from scratch: framework selection, CI pipeline integration, test environment management, flaky test elimination.
- Design the test data strategy for a multi-tenant platform: how to create, isolate, and tear down tenant-specific test data across different bounded contexts.
- Define what done means from a quality standpoint for each migration gate.
Cross-Team Quality Standards
- Establish QA standards that all teams follow: test naming conventions, assertion patterns, mocking strategies, test data factories.
- Own the cross-team quality dashboard: track test coverage, defect density, test execution time, flaky test rate, and escape rate per team.
- Conduct quarterly quality reviews with each team's QA engineer to identify gaps and mentor.
- Define the release quality gate: what tests must pass before a context can go to production. No exceptions without your sign-off.
Specialized Test Domains
This platform has failure modes that generic QA approaches will miss. You must own strategy (and guide team QA engineers) for:
- Event sourcing correctness: Verify that every aggregate reconstructs correctly from its event stream. Test snapshot + replay integrity. Verify idempotent event handlers.
- Eventual consistency: Design tests that validate read model projections converge within the defined staleness windows. Test what happens when they don't.
- Multi-tenant isolation: Verify that tenant A can never see or modify tenant B's data.Test at API, database, cache (Redis), and search index layers.
- POS offline reconciliation: Test the full offline sync conflict resolution flow.Simulate network partitions, partial syncs, and oversold scenarios.
- Financial accuracy: Every rupee must balance. Design reconciliation tests for GL entries, trial balance, GST filing accuracy.
- Marketplace integration correctness: Design contract tests for 60+ marketplace channel adapters. Ensure canonical order model fidelity. Test circuit breaker and dead letter queue behavior.
Migration Quality
- Own the parity testing strategy: how to verify that the new platform produces identical business outcomes to the existing system for every migrated workflow.
- Design the SHADOW mode validation: when both old and new systems process the same transactions, define how discrepancies are detected, categorized, and resolved.
- Define the data migration validation framework: row-count reconciliation, field-by- field comparison, financial balance verification after data migration for each context.
- Establish the cutover quality checklist: the minimum tests that must pass before a context can be switched from legacy to new platform in production.
Performance & Reliability Testing
- Own the load test strategy aligned with the platform throughput targets: 5,000 POS transactions/second peak, 100K Kafka events/second, 50,000 concurrent POS terminals.
- Design chaos testing scenarios: Kafka broker failure, Redis cluster partition, database primary failover, network partition between availability zones.
- Define performance regression thresholds: no release ships if p99 latency regresses more than 15% on any operation in the latency budget table.
Technical Expectations
You should be comfortable with and opinionated about:
- Test frameworks: Java ecosystem (xUnit , Testcontainers.net, REST Sharp, WireMock,)
- E2E tooling: Playwright or Cypress for web, Appium for mobile/Android (warehouse app)
- Contract testing: Pact or similar for inter-context contract verification
- Performance testing: Gatling or k6 for load testing, with custom metrics export to Prometheus
- CI/CD integration: GitHub Actions or Jenkins pipelines with test parallelization, test impact analysis
- Data validation: SQL-based reconciliation queries, CSV/JSON diff tools for migration parity
- Observability for QA: Correlating test failures with Jaeger traces, Kafka consumer lag, and Redis metrics
Qualification Criteria
Experience Required
(Must Have)
- BE / BTech with 10+ years in software quality engineering, with at least 3 years in a principal/staff-level role setting standards for multiple teams (not managing them, but influencing them).
- Hands-on experience building E2E test infrastructure from scratch for a micro-services architecture — not inheriting one, creating one.
- Experience testing event-sourced or event-driven systems. You understand that testing the order was created is different from testing the Order Created event was produced, consumed, projected, and the read model reflects it within the consistency window.
- Experience with multi-tenant SaaS platforms. You know where the isolation bugs hide.
- Experience with data migration validation at scale — comparing source and target systems after migration and proving they match.
- Strong Java (or Kotlin) programming ability. You will write production-grade test code, not throwaway scripts.
Strongly Preferred
- Retail / e-commerce / ERP domain experience. You understand why a stock
- reservation race condition between POS and OMS matters more than a minor UI glitch.
- Experience with marketplace integrations (Amazon, Flipkart, Shopify) and the chaos of external API testing.
- Experience with POS or offline-first systems. You understand what & works offline; actually means in practice.
- Experience with financial system testing. Double-entry bookkeeping, GST compliance, reconciliation — you've tested systems where approximately correct & quote; is not acceptable.
Nice to Have
- Experience with Camel-based integration routes
- Experience with Debezium / CDC pipelines
- Experience with Android warehouse/floor applications
- Certifications in performance engineering or security testing
#PrincipalSDET #QualityEngineering #QualityLeadership #TestAutomation #PlatformEngineering #EventDrivenArchitecture #MicroservicesTesting #PerformanceTesting #ChaosEngineering #DataMigration #CloudEngineering #RetailTechnology #EngineeringLeadership #LeadershipHiring #HiringNow