Talent.com
Principal Engineer - Systems for ML Inference and Training Optimization, Deep Science for Systems and Services
Principal Engineer - Systems for ML Inference and Training Optimization, Deep Science for Systems and ServicesAmazon Web Services Development Center Germany GmbH • Tübingen, Baden-Wurttemberg, DEU
Principal Engineer - Systems for ML Inference and Training Optimization, Deep Science for Systems and Services

Principal Engineer - Systems for ML Inference and Training Optimization, Deep Science for Systems and Services

Amazon Web Services Development Center Germany GmbH • Tübingen, Baden-Wurttemberg, DEU
Vor 30+ Tagen
Stellenbeschreibung

We are seeking an exceptional Principal Engineer specializing in ML Systems, training, and inference optimization to lead our technical strategy and implementation for next-generation AI performance at scale. This role requires deep expertise in performance engineering, distributed systems architecture, low-level systems optimization, and the ability to drive technical excellence across multiple teams. You will set the technical direction for kernel-level optimizations, define architectural strategies for heterogeneous compute platforms, architect multi-GPU and multi-node training systems, and lead the delivery of solutions that fundamentally change how AWS serves ML training and inference workloads.

As a Principal Engineer in DS3, you will be a key technical leader responsible for organization-level architecture and performance strategy spanning the entire ML lifecycle—from distributed training of frontier models to high-throughput inference serving. You will work at the lowest levels of the software stack—defining standards for CUDA kernel development, optimizing assembly-level code (e.g. Nvidia PTX code), architect cross-platform acceleration strategies including GPUs and AWS Neuron, designing efficient multi-node communication patterns, and inventing novel approaches to achieve 10× or greater performance improvements. Your work will directly influence AWS's competitive position in AI infrastructure and set the standard for ML systems engineering across the industry.

Utility Computing (UC)

AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services.

Key job responsibilities

Technical Strategy & Vision : Define and drive the technical strategy and architectural roadmap for ML inference and training optimization across multiple teams within your organization. Bring systems architecture and performance engineering context to strategic business decisions.

Cross-Platform Performance Leadership : Lead the design and architecture of kernel-level optimizations spanning NVIDIA GPUs, AWS Inferentia / Trainium, and emerging AI accelerators. Establish standards and best practices for low-level optimization across the organization.

Intrinsically Hard Problems : Tackle the most difficult performance challenges—endemic bottlenecks, architectural complexity that prevents innovation, and critical business / technical problems requiring order-of-magnitude improvements (10× or greater).

Systems-Level Innovation : Drive the design, implementation, and delivery of performance solutions at the program level that address significantly large or endemic customer and business problems across your organization and potentially others.

Hardware-Software Co-Design : Establish deep understanding of new SoCs, GPUs, and AI accelerators; derive guidelines for optimal utilization and influence hardware selection decisions based on facts-driven analysis and resource budgeting.

Technical Excellence & Force Multiplication : Set the standard for engineering excellence in your organization. Create mechanisms, tools, and processes that enable performance measurement, analysis, and optimization at scale across multiple teams.

Organization-Level Influence : Align teams across your organization toward coherent performance strategies and architectural decisions. Drive adoption of new optimization approaches, concepts, and paradigms. Lead the most important and complex technical reviews.

Team Development : Guide the career growth of senior engineers in your organization. Mentor and develop the next generation of performance engineering leaders. Participate in Principal promotion assessments and help grow the Principal Engineering community.

Hands-On Technical Leadership : Remain a practitioner—personally writing critical-path code, designing zero-overhead portable libraries, and prototyping solutions that inform technical direction for your organization.

About the team

Deep Science for Systems and Services (DS3) is a science organization within AWS Compute & ML Services focused on advancing AI / ML technologies at the systems level. Our team works at the intersection of machine learning and high-performance computing, developing optimizations for large model inference across diverse hardware platforms. We push the boundaries of what's possible in ML inference performance, working directly with CUDA, AWS Neuron, and other low-level compute abstractions to deliver order-of-magnitude performance improvements and industry-leading cost-performance for AWS customers deploying AI at scale.

About AWS

Diverse Experiences

Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.

Why AWS

Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.

Work / Life Balance

We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.

Inclusive Team Culture

Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (diversity) conferences, inspire us to never stop embracing our uniqueness.

Mentorship and Career Growth

We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.

BASIC QUALIFICATIONS

10+ years of software development experience with demonstrated progression in technical leadership and impact.

Expert-level proficiency in C / C++ and low-level systems programming with proven track record of delivering order-of-magnitude (10× or greater) performance improvements in production systems.

Extensive experience with CUDA programming, GPU architecture, assembly-level optimization (e.g., Nvidia PTX), and kernel development across multiple hardware platforms.

Demonstrated ability to lead organization-level technical initiatives spanning multiple teams, building consensus on contentious technical decisions and driving architectural strategy.

Experience defining technical roadmaps, conducting performance analysis and resource budgeting, and translating system analysis into strategic development plans.

PREFERRED QUALIFICATIONS

Master's degree (or higher) in Computer Science, Computer Engineering, or related technical field with 15+ years of performance engineering experience.

Experience optimizing ML inference and / or training workloads (LLMs, Transformers, CNNs) across diverse hardware : GPUs, AWS Neuron / Inferentia, and other accelerators.

Deep expertise across multiple hardware architectures and platforms (x86, ARM, multiple GPU generations, SoCs, custom accelerators) with ability to quickly master new hardware platforms.

Track record of developing portable, high-performance libraries, tools, or frameworks used across engineering organizations or open-source projects with significant adoption.

Experience leading large-scale optimization initiatives or coordinating performance engineering efforts across multiple teams and organizations.

Proven ability to establish deep understanding of complex systems and create performance measurement / analysis tools that provide critical insights for organization-wide use.

Entrepreneurial experience including startup founding, CTO role, or driving technical vision in product development environments.

Jobalert für diese Suche erstellen

System Engineer • Tübingen, Baden-Wurttemberg, DEU

Ähnliche Stellen
Senior Digital ASIC Development / Verification Engineer (m / f / d)

Senior Digital ASIC Development / Verification Engineer (m / f / d)

Advantest Europe GmbH • Ehningen
Advantest - We enable tomorrowʻs technology.IoT, 5G and Artificial Intelligence.More than half of all the microchips produced worldwide first pass through our hands.As t...Mehr anzeigen
Zuletzt aktualisiert: vor 26 Tagen • Gesponsert
Senior Engineer Networks (m / w / d)

Senior Engineer Networks (m / w / d)

CGI Deutschland B.V. & Co. KG • Leinfelden-Echterdingen, DE
Du brennst für IT-Infrastruktur und Enterprise Mobility? Dann werde Teil unserer Erfolgsgeschichte und gestalte mit uns die digitale Zukunft!. Du unterstützt unsere Kunden im öffentlichen Sektor bei...Mehr anzeigen
Zuletzt aktualisiert: vor 21 Tagen • Gesponsert
Cloud System Consultant - Agile Methoden / Kubernetes / Machine Learning (m / w / d) - Digital Frontiers

Cloud System Consultant - Agile Methoden / Kubernetes / Machine Learning (m / w / d) - Digital Frontiers

Digital Frontiers • Sindelfingen, Germany
Wir verstehen Digital und sind für unsere Kunden technologisch an vorderster Front unterwegs.Wir sind ein modernes Beratungsunternehmen, das inspiriert und befähigt, sowohl unsere Mitarbeiter als a...Mehr anzeigen
Zuletzt aktualisiert: vor über 30 Tagen • Gesponsert
Ausbildung Mikrotechnologie (m / w / d) Schwerpunkt Mikrosystemtechniker zum 01.09.2026

Ausbildung Mikrotechnologie (m / w / d) Schwerpunkt Mikrosystemtechniker zum 01.09.2026

SÜSS MicroTec SE • Sternenfels, Baden-Württemberg, DE
SUSS ist spezialisiert auf hochpräzise Anlagen für die Halbleiterfertigung.Weltweit führende Chiphersteller und renommierte Forschungsinstitute vertrauen auf unsere innovativen Lösungen für Backend...Mehr anzeigen
Zuletzt aktualisiert: vor über 30 Tagen • Gesponsert
Trainee Simulation Engineer - Vehicle Dynamics HiL (d / f / m / x)

Trainee Simulation Engineer - Vehicle Dynamics HiL (d / f / m / x)

Mercedes-Benz AG • Sindelfingen, Germany
Life is always about becoming… Becoming means going on a journey to be the best version of our future selves.While we discover new things, we will face challenges, master them and grow beyond our i...Mehr anzeigen
Zuletzt aktualisiert: vor 21 Stunden • Gesponsert • Neu!
Power System Modelling Expert / Electrical Engineer (f / m / d)

Power System Modelling Expert / Electrical Engineer (f / m / d)

DIgSILENT GmbH • Gomaringen, de, DE
Power System Modelling Expert / Electrical Engineer (f / m / d) (f / m / d)Electrical Engineer.DigSILENT GmbH is an independent software and consulting company in the field of electrical power supply, serv...Mehr anzeigen
Zuletzt aktualisiert: vor über 30 Tagen
Junior Managers Program - Research & Development Corporate Research - Trainee program (m / f / div.)

Junior Managers Program - Research & Development Corporate Research - Trainee program (m / f / div.)

Bosch Gruppe • Gerlingen
Do you want beneficial technologies being shaped by your ideas? Whether in the areas of mobility solutions, consumer goods, industrial technology or energy and building technology - with us, you wi...Mehr anzeigen
Zuletzt aktualisiert: vor 13 Stunden • Gesponsert • Neu!
IT Security & Systems Engineer (m / w / d)

IT Security & Systems Engineer (m / w / d)

itdesign GmbH • Tübingen, Baden-Württemberg, DE
Unsere Motivation sind Softwarelösungen, die den Arbeitsalltag unserer Kund •innen wirklich erleichtern und sie erfolgreicher machen. Das schaffen wir mit einem einzigartigen Team, Leidenschaft für T...Mehr anzeigen
Zuletzt aktualisiert: vor 4 Stunden • Gesponsert • Neu!
R&D Software Engineer (C++) (m / f / d)

R&D Software Engineer (C++) (m / f / d)

Advantest Europe GmbH • Böblingen, Deutschland
Advantest - We enable tomorrowʻs technology.IoT, 5G und Artificial Intelligence.Als Weltmarktführer automatisierter Testsysteme in der Halbleiterindustrie gehen mehr als die Hälfte a...Mehr anzeigen
Zuletzt aktualisiert: vor 16 Tagen • Gesponsert
Software Engineer Dynamic Simulation

Software Engineer Dynamic Simulation

DIgSILENT GmbH • Gomaringen, de, DE
Software Engineer Dynamic Simulation (f / m / d)Hardware-Based High Performance Computing (FPGA / GPU).DigSILENT GmbH is an independent software and consulting company in the field of electrical power su...Mehr anzeigen
Zuletzt aktualisiert: vor weniger als 1 Stunde • Neu!
Senior Systemingenieur (m / w / d) Space & Tech

Senior Systemingenieur (m / w / d) Space & Tech

Gallmond GmbH • Tübingen, Deutschland
In einem vertraulichen Gespräch haben Sie die Möglichkeit, die spannenden Perspektiven dieser Schlüsselrolle in einem der innovativsten Technologiefelder Europas kennenzulernen.Für diese anspruchsv...Mehr anzeigen
Zuletzt aktualisiert: vor 21 Tagen
Systems Engineer – Secret Clearance | Moehringen, Germany

Systems Engineer – Secret Clearance | Moehringen, Germany

Cambridge International Systems Inc • Moehringen, DE
Quick Apply
Systems Engineer – Secret Clearance | Moehringen, Germany Cambridge International Systems, Inc.Join a dynamic global team united by shared values : commitment, integrity, and ...Mehr anzeigen
Zuletzt aktualisiert: vor 10 Tagen
Manager (m / w / d) Individualsoftware-Projekte

Manager (m / w / d) Individualsoftware-Projekte

conet Deutschland GmbH • Karlsruhe, Stuttgart, Germany
Leading Forward! Arbeite bei conet an spannenden digitalen Themen und Projekten.Für die öffentliche Verwaltung, Verteidigung und Unternehmen entwickeln wir moderne und nachhaltige End-to-End-Lösung...Mehr anzeigen
Zuletzt aktualisiert: vor 2 Tagen • Gesponsert
Trainee Machine Learning Engineer - Lidar Perception (d / m / w / x)

Trainee Machine Learning Engineer - Lidar Perception (d / m / w / x)

Mercedes-Benz AG • Sindelfingen, Germany
Life is always about becoming… Im Leben geht es darum, sich auf eine Reise zu begeben, um die beste Version unseres zukünftigen Selbst zu werden. Während wir Neues entdecken, stellen wir uns Herausf...Mehr anzeigen
Zuletzt aktualisiert: vor 2 Tagen • Gesponsert
IT Security & Systems Engineer (m / w / d)

IT Security & Systems Engineer (m / w / d)

IT & Internet Karriere • Tübingen, Germany
Unsere Motivation sind Softwarelösungen, die den Arbeitsalltag unserer Kund •innen wirklich erleichtern und sie erfolgreicher machen. Das schaffen wir mit einem einzigartigen Team, Leidenschaft für T...Mehr anzeigen
Zuletzt aktualisiert: vor 1 Stunde • Gesponsert • Neu!
VDI Systems Engineer – Secret Clearance | Stuttgart, Germany

VDI Systems Engineer – Secret Clearance | Stuttgart, Germany

Cambridge International Systems Inc • Vaihingen, Vaihingen, DE
Quick Apply
VDI Systems Engineer – Secret Clearance | Stuttgart, Germany Cambridge International Systems, Inc.Join a dynamic global team united by shared values : commitment, integrity, and persever...Mehr anzeigen
Zuletzt aktualisiert: vor 9 Tagen
DevOps Engineer (m / w / d)

DevOps Engineer (m / w / d)

Hays • Tübingen, Germany
Verantwortung für den Betrieb und die Weiterentwicklung der Infrastruktur im Linux-Umfeld.Aufbau, Pflege und Optimierung von CI / CD-Pipelines. Automatisierung von Deployments und Konfigurationen mit ...Mehr anzeigen
Zuletzt aktualisiert: vor 8 Tagen • Gesponsert
Ausbildung Mikrotechnologe / -technologin Schwerpunkt Mikrosystemtechniker(m / w / d)

Ausbildung Mikrotechnologe / -technologin Schwerpunkt Mikrosystemtechniker(m / w / d)

SUSS MicroTec SE • Sternenfels, Baden-Württemberg
Wenn Sie von Technologie begeistert sind und eine Ausbildung in der Halbleiterindustrie anstreben dann ist diese Chance genau das Richtige für Sie! Bewerben Sie sich jetzt und seien Sie Teil unsere...Mehr anzeigen
Zuletzt aktualisiert: vor 16 Stunden • Gesponsert • Neu!