OpenAI

Reliability/DFX Engineer

Job Description

Posted on: 
2026-04-06

Responsibilities

  • Oversee DFX architecture and implementation from concept to deployment.
  • Build system-level reliability models using empirical data.
  • Collaborate with chip and platform design teams on DFX features.
  • Partner with hardware health teams to improve reliability in new product introductions.
  • Serve as the DFX/reliability champion within the industry ecosystem.
  • Propose high-ROI features to enhance fault tolerance in AI hardware.
  • Analyze data to drive continuous improvements in reliability.

Job Requirements

  • BS with 15+ years, MS with 10+ years, or PhD with 3+ years of relevant experience.
  • Hands-on experience with RTL design and DFT is required.
  • Detailed understanding of ML chip and platform architecture.
  • Strong fundamentals in reliability modeling and empirical data analysis.
  • Experience with physical implementation or silicon ATE is preferred.
  • Ability to collaborate across teams and communicate effectively.
  • Knowledge of ML workload characteristics is required.
Apply now

More job openings