Kubernetes Guru Needed for NVIDIA’s DGX Cloud (Remote, US)

Website NVIDIA

NVIDIA at AWS re:Invent 2024

Want to work at the forefront of AI infrastructure? NVIDIA is seeking a skilled Senior Software Engineer to join their DGX Cloud team. You’ll be responsible for ensuring the smooth operation and scaling of their cutting-edge GPU clusters used for diverse AI workloads.

What you’ll do:

  • Develop and optimize Kubernetes-based solutions for scheduling GPU resources.
  • Build and maintain systems for monitoring the health and performance of GPU clusters.
  • Collaborate with teams across NVIDIA to ensure reliable and efficient AI infrastructure.
  • Troubleshoot system failures and improve services through incident management.

What you’ll need:

  • 5+ years of experience in a similar software engineering role, with a proven track record of impactful work.
  • Strong Kubernetes API and framework experience (beyond just cluster operations).
  • Excellent communication and collaboration skills.
  • A Bachelor’s degree in Computer Science or a related field.
  • Proficiency in Go or Python and a solid understanding of data structures and algorithms.

Bonus points for:

  • Experience managing large-scale distributed systems.
  • Deep understanding of cluster management systems like Kubernetes, Slurm, or Bright Cluster Manager.
  • A passion for AI and a desire to push the boundaries of technology.

Benefits:

  • Competitive salary (ranging from $148,000 to $339,250 USD, depending on location and experience).
  • Equity and comprehensive benefits package.
  • Opportunity to work with some of the brightest minds in the industry on groundbreaking AI projects.

If you’re a Kubernetes expert with a passion for AI and a drive to innovate, apply now!

Apply Now: Click Here

To apply for this job please visit nvidia.wd5.myworkdayjobs.com.

Scroll to Top