Meta

Production Systems Engineer, AI Systems

Job Description

Posted on: 
2025-03-24

Responsibilities

  • Support new AI platform introduction by driving scale up and scale out interface integration.
  • Create experiments and tooling to detect and diagnose hardware/firmware/software health issues.
  • Develop understanding of AI workload traffic for new product introduction.
  • Enable hacks for future technology explorations in AI workloads.
  • Troubleshoot and diagnose system failures, isolating components while collaborating with partners.
  • Develop visibility through data visualization and implement solutions for hardware health issues.
  • Use production experience to drive quality improvements with teams.

Job Requirements

  • Bachelor's degree in Computer Science, Computer Engineering, or a relevant technical field.
  • 2+ years of experience in Network ASIC/Platform development or related areas.
  • Knowledge of server architecture and components.
  • Experience with Linux and TCP/IP, including iperf.
  • Hands-on troubleshooting and debugging experience.
  • Preferred experience with Network Interface Cards (NICs) and RDMA/RoCE.
  • Experience with large-scale deployments and Python scripting.
Apply now

More job openings