AMD

HPC Systems Engineer

Job Description

Posted on: 
2026-04-10

The HPC Systems Engineer role at AMD involves the design, development, and administration of High-Performance Computing (HPC) infrastructure, GPU clusters, and AI workload schedulers, with a focus on building scalable and performant services.

Responsibilities

  • Develop, implement, and maintain GPU-based clusters for optimal performance.
  • Administer ML/AI platforms, managing deployments and resource allocation.
  • Automate system provisioning and cluster management end to end.
  • Collaborate with cross-functional teams to support AI infrastructure requirements.
  • Monitor and evaluate the performance of AI systems and clusters.
  • Use AI/ML to improve internal processes and tools for service delivery.
  • Manage GPU clusters and optimize GPU-based services/tools/software.

Job Requirements

  • Experience in developing Python-based AI applications.
  • Proficiency in HPC infrastructure engineering for AI/HPC domain.
  • Familiarity with SLURM and Kubernetes management.
  • Strong organizational and problem-solving skills.
  • Excellent verbal and written communication skills.
  • Experience with automation/monitoring tools such as Ansible, Terraform, etc.
  • Demonstrated experience with AI workload schedulers and optimization techniques.
Apply now

More job openings