HPC Systems Engineer

Location

San Jose, CA

Level

Mid-Level

Department

Semiconductors

Type

Salary

Job Description

Posted on:

2026-04-10

The HPC Systems Engineer role at AMD involves the design, development, and administration of High-Performance Computing (HPC) infrastructure, GPU clusters, and AI workload schedulers, with a focus on building scalable and performant services.

Responsibilities

Develop, implement, and maintain GPU-based clusters for optimal performance.
Administer ML/AI platforms, managing deployments and resource allocation.
Automate system provisioning and cluster management end to end.
Collaborate with cross-functional teams to support AI infrastructure requirements.
Monitor and evaluate the performance of AI systems and clusters.
Use AI/ML to improve internal processes and tools for service delivery.
Manage GPU clusters and optimize GPU-based services/tools/software.

Job Requirements

Experience in developing Python-based AI applications.
Proficiency in HPC infrastructure engineering for AI/HPC domain.
Familiarity with SLURM and Kubernetes management.
Strong organizational and problem-solving skills.
Excellent verbal and written communication skills.
Experience with automation/monitoring tools such as Ansible, Terraform, etc.
Demonstrated experience with AI workload schedulers and optimization techniques.

Apply now

More job openings

All jobs

HPC Systems Engineer

Job Description

Responsibilities

Job Requirements

About

AMD

More job openings

Head of Hardware

Technical Chief of Staff for ASIC Engineering

Intern Engineer - Electrical, Software, Comp Science

Manufacturing Quality Engineer

About

AMD