Talent.com
Senior Systems Engineer (AI Cloud Infrastructure)
Senior Systems Engineer (AI Cloud Infrastructure)MULTIVERSE COMPUTING • München, Bavaria, Germany
Senior Systems Engineer (AI Cloud Infrastructure)

Senior Systems Engineer (AI Cloud Infrastructure)

MULTIVERSE COMPUTING • München, Bavaria, Germany
Vor 30+ Tagen
Stellenbeschreibung

Multiverse Computing

Multiverse is a well-funded fast-growing deep-tech company founded in 2019. We are the largest quantum software company in the EU and have been recognized by CB Insights (2023 and 2025) as one of the 100 most promising AI companies in the world.

With 180 employees and growing our team is fully multicultural and international. We deliver hyper-efficient software for companies seeking a competitive edge through quantum computing and artificial intelligence.

Our flagship products CompactifAI and Singularity address critical needs across various industries:

CompactifAI is a groundbreaking compression tool for foundational AI models based on Tensor Networks. It enables the compression of large AI systemssuch as language modelsto make them significantly more efficient and portable.

Singularity is a quantum- and quantum-inspired optimization platform used by blue-chip companies to solve complex problems in finance energy manufacturing and beyond. It integrates seamlessly with existing systems and delivers immediate performance gains on classical and quantum hardware.

Youll be working alongside world-leading experts to develop solutions that tackle real-world challenges. Were looking for passionate individuals eager to grow in an ethics-driven environment that values sustainability and diversity.

Were committed to building a truly inclusive culturecome and join us.

Role description

We are looking for a Senior Engineer to lead a critical initiative within our Platform Engineering team: building the software layer for AI Gigafactory. In this role you will move beyond consuming public cloud resources to architecting and building a private Neo-cloud from the ground up. You will design the control planes that manage high-performance compute clusters orchestrate thousands of GPUs and optimize the hardware-software interface for massive AI workloads.

This role sits at the intersection of High-Performance Computing (HPC) Kubernetes Internals and Bare Metal Engineering.

What you will be doing

  • Building the Control Plane: Designing and developing the software layer (APIs Controllers Agents) that automates the lifecycle of bare-metal AI infrastructure.

  • Orchestrating High-Scale Compute: Architecting scheduling solutions for large-scale distributed training jobs across massive clusters of GPUs (NVIDIA H200/B200/B300) ensuring efficient bin-packing and gang scheduling.

  • Optimizing the Fabric: Tuning the software-defined networking layer to support low-latency interconnects (InfiniBand/RDMA/RoCEv2) essential for multi-node training.

  • Developing Kubernetes Extensions: Writing custom Kubernetes Operators and CRDs to abstract complex hardware realities (topology awareness GPU partitioning) into usable interfaces for our Data Scientists.

  • Hardware-Level Debugging: Investigating and resolving deep systems issues ranging from PCIe bus errors and NCCL communication timeouts to kernel panics on bare-metal nodes.

  • Defining Standards: Creating the Golden Image for AI workloads managing drivers firmware and OS optimizations to squeeze maximum performance out of the hardware.

Requirements

  • Systems Programming Expertise: 10 years of software engineering experience with strong proficiency in Go (Golang) C or Rust. You must be comfortable building system agents APIs and CLI tools.

  • Deep Kubernetes Knowledge: You understand K8s internals beyond simple deployment. Experience with Custom Resource Definitions (CRDs) Operators and the Kubernetes API server architecture.

  • GPU Ecosystem Experience: Hands-on experience managing NVIDIA GPU clusters. Familiarity with NVIDIA drivers CUDA toolkit and the container runtime (NVIDIA Container Toolkit).

  • Linux Internals: Deep understanding of the Linux kernel cgroups namespaces and system performance tuning.

  • Infrastructure as Code: Mastery of declarative infrastructure tools (Terraform Ansible) but with a focus on provisioning physical hardware rather than just cloud VMs.

  • Problem Solving: A proven track record of debugging complex distributed systems where the root cause could be code network or silicon.

Preferred qualifications

  • HPC Background: Experience working with traditional supercomputing schedulers (Slurm PBS) or modern batch schedulers (Volcano Kueue Ray).

  • Bare Metal Provisioning: Experience with tools like Cluster API (CAPI) Metal3 Tinkerbell Canonical MaaS or OpenStack Ironic.

  • High-Speed Networking: Knowledge of RDMA InfiniBand GPUDirect and how to expose these technologies to containerized workloads.

  • AI/ML Familiarity: Understanding of how distributed training works (e.g. PyTorch Distributed Megatron-LM DeepSpeed) and the infrastructure requirements of Large Language Models (LLMs).

  • Observability: Experience building monitoring for hardware health (DCGM) and distributed tracing for long-running jobs.

Location:Applicants must have legal authorization to work in the country where the position is based

Perks & Benefits

  • Indefinite contract.

  • Equal pay guaranteed.

  • Variable performance bonus.

  • Signing bonus.

  • Relocation package (if applicable).

  • Private health insurance.

  • Eligibility for educational budget according to internal policy.

  • Hybrid opportunity.

  • Flexible working hours.

  • Working in a high paced environment working on cutting edge technologies.

  • Career plan. Opportunity to learn and teach.

  • Progressive Company. Happy people culture

As an equal opportunity employer Multiverse Computing is committed to building an inclusive workplace. The company welcomes people from all different backgrounds including age citizenship ethnic and racial origins gender identities individuals with disabilities marital status religions and ideologies and sexual orientations to apply.


Key Skills
Active Directory Administration,Animal,Apparel,Entry Level,Jboss,Inventory Management
Employment Type : Full Time
Experience: years
Vacancy: 1
Jobalert für diese Suche erstellen

Senior Systems Engineer (AI Cloud Infrastructure) • München, Bavaria, Germany

Ähnliche Stellen
Senior AI DevSecOps (m/f/d)

Senior AI DevSecOps (m/f/d)

Deutsche Telekom AG • Munich, de
At T-Systems, we offer business customers the right system solutions for their digital business.With our portfolio we ensure that digital transformation reduces complexity, saves costs and makes da...Mehr anzeigen
Zuletzt aktualisiert: vor 13 Tagen • Gesponsert
Cloud Platform Engineer (gn)

Cloud Platform Engineer (gn)

XXXLutz • München, DE
Welcome to XXXLdigital, where digital innovation meets real-world impact.As the digital unit of the XXXL Group, we are dedicated to all things e-commerce and stationary trade.We develop software so...Mehr anzeigen
Zuletzt aktualisiert: vor über 30 Tagen
System Architect (m/f/d)

System Architect (m/f/d)

SUSS MicroTec Solutions GmbH und Co. KG • Garching bei München, Sternenfels bei Pforzheim, DE
Große Ideen beginnen oft klein – manchmal kleiner als ein Staubkorn.Mikrochips sind die treibende Kraft hinter den Werkzeugen und Geräten, auf die wir uns täglich verlassen.Wir ermöglichen ihre Her...Mehr anzeigen
Zuletzt aktualisiert: vor 4 Tagen • Gesponsert
Senior DevOps Engineer (m/w/d)

Senior DevOps Engineer (m/w/d)

Reply Group • München, de
Senior DevOps Engineer (m/w/d).Du möchtest nicht nur beraten, sondern steuern, strukturieren und Verantwortung übernehmen? Als Projektmanager (m/w/d) bei Cluster Dynamics Reply kannst du genau dies...Mehr anzeigen
Zuletzt aktualisiert: vor über 30 Tagen • Gesponsert
Senior Infrastructure Engineer (Windows & Hybrid Cloud) (m/w/d)

Senior Infrastructure Engineer (Windows & Hybrid Cloud) (m/w/d)

Versicherungen Karriere • München, München (Kreis), Bayern
Wir, die Lebensversicherung von 1871 a.Rund 550 Mitarbeitende arbeiten im Herzen Münchens für den ebenso modernen wie t.Mehr anzeigen
Zuletzt aktualisiert: vor 3 Tagen • Gesponsert
Senior DevOps Engineer (m/w/d)

Senior DevOps Engineer (m/w/d)

Secunet Security Networks AG • München, DE
Deutschlands führendes Cybersecurity-Unternehmen.In einer zunehmend vernetzten Welt sorgt das Unternehmen mit der Kombination aus Produkten und Beratung für widerstandsfähige, digitale Infrastruktu...Mehr anzeigen
Zuletzt aktualisiert: vor 28 Tagen • Gesponsert
Senior Cloud Engineer (m/w/d)

Senior Cloud Engineer (m/w/d)

Reply Group • München, de
Du möchtest nicht nur beraten, sondern steuern, strukturieren und Verantwortung übernehmen? Als Projektmanager (m/w/d) bei Cluster Dynamics Reply kannst du genau diese Rolle übernehmen!.Mehr anzeigen
Zuletzt aktualisiert: vor über 30 Tagen • Gesponsert
Cloud (AWS) Platform Engineer (m/w/d)

Cloud (AWS) Platform Engineer (m/w/d)

Mandl. Executives & Experts • Munich, Bavaria, Germany
Homeoffice
Quick Apply
Du möchtest aktiv an der Zukunft moderner IT-Infrastrukturen mitwirken und bringst Erfahrung im Cloud-Umfeld mit?.In dieser Rolle gestaltest du den Aufbau einer hochverfügbaren und regulierten Clou...Mehr anzeigen
Zuletzt aktualisiert: vor über 30 Tagen
DevOps Cloud Engineer [Mnchen](m/w/d)

DevOps Cloud Engineer [Mnchen](m/w/d)

ventx GmbH • Freising, Germany
Fachinformatiker) oder entsprechende Berufserfahrung.Leidenschaftlicher ITler und zuverlssig.Erfahrung mit Automatisierung im IT-Bereich (z.Du sprichst verhandlungssicher Deutsch und Englisch.Proje...Mehr anzeigen
Zuletzt aktualisiert: vor über 30 Tagen • Gesponsert
Senior Software Engineer (m/w/d) – Cloud-Native (AWS & Kubernetes)

Senior Software Engineer (m/w/d) – Cloud-Native (AWS & Kubernetes)

E. Breuninger GmbH & Co. • Munich, de
Senior Software Engineer (m/w/d) – Cloud-Native (AWS & Kubernetes).Breuninger ist der führende Fashion- und Lifestyle-Department Store im Premium- und Luxussegment und seit über 140 Jahren am Markt...Mehr anzeigen
Zuletzt aktualisiert: vor über 30 Tagen • Gesponsert
Senior System Engineer Cloud Operation (m/w/d)

Senior System Engineer Cloud Operation (m/w/d)

BWI GmbH • München, Bayern, Deutschland
Kolleg*innen betreiben und modernisieren wir eine der größten und komplexesten IT-Infrastrukturen in Deutschland.Sorge gemeinsam mit uns für die digitale Zukunftsfähigkeit der Bundeswehr.Senior Sys...Mehr anzeigen
Zuletzt aktualisiert: vor 20 Tagen • Gesponsert
Cloud Architect (all genders) in München - PRODYNA SE

Cloud Architect (all genders) in München - PRODYNA SE

PRODYNA SE • Munich, Germany
Du bist auf der Suche nach einem Ort, an dem die positive Einstellung und der unglaubliche Teamgeist die Schlüsselwerte deines Umfelds sind? Wir validieren und implementieren neue Technologien und ...Mehr anzeigen
Zuletzt aktualisiert: vor 8 Tagen • Gesponsert
Network Systems Engineer (d/f/m) - Inklusiver Job 🦼 🦻 🦯

Network Systems Engineer (d/f/m) - Inklusiver Job 🦼 🦻 🦯

Airbus • München, Bayern, DE
Für Airbus Defence & Space in Taufkirchen suchen wir eine(n) Ground Segment Systems Engineer (d/f/m) zur Verstärkung des Teams „System Delivery Germany“.Als Ground Segment Systems Engineer (d/f/m) ...Mehr anzeigen
Zuletzt aktualisiert: vor über 30 Tagen • Gesponsert
Systems Engineer - User Segment of Military Satellite Communication Systems (d/f/m) (d/f/m) - Inklusiver Job 🦼 🦻 🦯

Systems Engineer - User Segment of Military Satellite Communication Systems (d/f/m) (d/f/m) - Inklusiver Job 🦼 🦻 🦯

Airbus • München, Bayern, DE
For Airbus Defence & Space in Taufkirchen, we are looking for a Systems Engineer - Military Satellite Communication Systems (d/f/m).The department is developing satellite communication systems for ...Mehr anzeigen
Zuletzt aktualisiert: vor 26 Tagen • Gesponsert
System Engineer Private Cloud (m/w/d) | HCPHCS

System Engineer Private Cloud (m/w/d) | HCPHCS

Atruvia AG • Aschheim (München), de
System Engineer Private Cloud (m/w/d) | HCPHCS.Als deutschsprachiges Unternehmen nutzen wir zukunftsweisende Technologien wie künstliche Intelligenz oder Smart Data und schreiben Prozessoptimierun...Mehr anzeigen
Zuletzt aktualisiert: vor 28 Tagen • Gesponsert
Senior DevOps / Cloud Engineer (m/w/d)

Senior DevOps / Cloud Engineer (m/w/d)

QAVION SALES & RECRUITING GmbH • Grünwald, Bavaria, Germany
Homeoffice
Quick Apply
Boutique Visionen in die Umsetzung, indem wir Consulting, Sales & Recruiting, Ventures sowie Law & Tax nahtlos vereinen und gebündelte Kompetenz in messbare Ergebnisse und nachhaltiges Wachstum übe...Mehr anzeigen
Zuletzt aktualisiert: vor über 30 Tagen
Senior Azure Cloud Architect

Senior Azure Cloud Architect

Nordcloud, an IBM company • Bad Tölz, Germany
Hands-on experience with Azure from successfully implemented projects.Experience with leading a technical team - providing guidance to your colleagues in the project.DevSecOps or SRE 'toolkit' and ...Mehr anzeigen
Zuletzt aktualisiert: vor über 30 Tagen • Gesponsert
Senior System Engineer - Platform Infrastructure (m/w/d) || netgo tax

Senior System Engineer - Platform Infrastructure (m/w/d) || netgo tax

netgo group GmbH • München, Bayern, Deutschland
Werde auch du "part of netgo group" - einem der größten IT-Dienstleister Deutschlands.Mitarbeiter*innen an zahlreichen Standorten in ganz Deutschland erwarten dich als neues Teammitglied.Du sorgst ...Mehr anzeigen
Zuletzt aktualisiert: vor 3 Tagen • Gesponsert