This role sits at the core of a high-performance processor IP team, owning PPA optimization, building scalable RTL-to-GDSII flows, and supporting customers through integration and tapeout. You will work across architecture, RTL, and physical design to drive real silicon outcomes and meet aggressive performance, power, and area targets across nodes.
Key Responsibilities
- Drive PPA optimization across timing, area, leakage, and dynamic power
- Apply low-power techniques and tune synthesis/P&R for aggressive targets
- Build and maintain reusable RTL-to-GDSII reference flows
- Develop automation using TCL/Python to improve flow efficiency
- Collaborate with architecture and RTL teams to influence design trade-offs
- Support customers from evaluation to tapeout, resolving implementation issues
- Contribute to PPA modeling and feasibility analysis for pre-sales
Ideal Candidate
- 7+ years of ASIC / processor IP physical design experience with a strong focus on PPA optimization and flow development
- Hands-on experience with Synopsys or Cadence tools (synthesis, P&R, STA)
- Experience with advanced nodes (16nm and below, FinFET) and multi-node exposure preferred
- Strong scripting skills in TCL and Python
- Solid understanding of timing closure, congestion, power optimization, and MCMM analysis
- Experience with low-power design techniques and working knowledge of DFT implications
- Experience supporting customer SoC integration, IP delivery, or tapeout cycles is a plus
- Background in AI accelerators, NPUs, or DSP architectures is a plus
- Exposure to QoR tracking, large-scale runs, and AI-assisted coding tools is a plus
The Offer
- Opportunity to work on cutting-edge processor IP with real-world impact
- High-ownership role influencing PPA, product delivery, and customer success
- Collaborative, low-politics engineering culture
- Fast-paced environment with strong learning and growth potential
About the employer
Our client is a Silicon Valley based deep-tech company building a new compute architecture for real-time AI at the edge. Founded by engineers from leading research backgrounds, the focus is on solving the gaps in current neural processing approaches through tight integration of hardware and software.
The platform is built to run both neural network inference and conventional compute workloads efficiently across a wide range of edge devices. Unlike typical accelerators that only handle parts of an ML graph, this architecture supports end-to-end execution, including both neural network graph code and standard C++ DSP and control code, enabling greater flexibility and performance in real-world deployments.