OpenAI

Reliability/DFX Engineer

Job Description

Posted on: 
2025-12-21

Responsibilities

  • Oversee DFX architecture, implementation, and execution in silicon from concept to high-volume deployment.
  • Propose high-ROI features to enhance reliability and fault tolerance in AI hardware.
  • Build system-level reliability models based on empirical data to guide DFX and reliability strategy.
  • Collaborate with chip and platform architecture/design teams to implement DFX features.
  • Partner with hardware health and platform design teams to optimize reliability in new product introduction (NPI) and high-volume manufacturing (HVM).
  • Serve as the DFX/reliability champion to align with the broader industry ecosystem.
  • Conduct data analysis to drive continuous improvements across the system stack.

Job Requirements

  • BS with 15+ years, MS with 10+ years, or PhD with 3+ years of experience in reliability across the chip/platform stack.
  • Hands-on experience with RTL design and design for testability (DFT).
  • Detailed understanding of ML chip and platform architecture.
  • Strong fundamentals in reliability modeling and empirical data analysis.
  • Experience with physical implementation and/or silicon ATE is preferred.
  • Ability to collaborate effectively with cross-functional teams.
  • Strong communication skills to advocate for DFX and reliability initiatives.
Apply now

More job openings