Talent.com

Reliability engineer Jobs in Deutschland

Jobalert für diese Suche erstellen

Reliability engineer • deutschland

Zuletzt aktualisiert: vor 4 Tagen

AI Infrastructure & Reliability Engineer

HiBobBerlin, de

AI Infrastructure & Reliability Engineer.HiBob helps modern, mid-size businesses transform the way they manage people, giving HR and managers all they need to connect, engage, develop, and retain t... Mehr anzeigen

 • Gesponsert

Reliability & Test Engineer

Dunia Innovations GmbHBerlin, Berlin, DE

Make sure the system never lies, and rarely fails, no matter the complexity .Dunia is building AI-native, automated laboratories for materials discovery.Our systems combine hardware, software, chem... Mehr anzeigen

Senior Site Reliability Engineer (SRE)

1GLOBALBerlin, BE, DE
Quick Apply

Powered by a best-in-class telecom platform – including its own owned and operated global mobile core network, fully fledged in-house developed eSIM technology, and an extensive portfolio of teleco... Mehr anzeigen

Reliability Engineer

Nespresso Deutschland GmbHNUNSPEET, de

Reliability Engineer, dit is je uitdaging.Ben jij een gedreven professional met een passie voor onderhoud en techniek? Wil je werken in een dynamische omgeving waar je een cruciale rol speelt in he... Mehr anzeigen

 • Gesponsert

Senior Site Reliability Engineer (Azure)

MLabsDE
Homeoffice
Quick Apply

Senior Site Reliability Engineer (Enterprise Platform).US - Open to Europe if happy to overlap with EST.Senior Site Reliability Engineer (Azure).This position is critical for ensuring that the plat... Mehr anzeigen

Safety & Reliability Engineer

Schneider ElectricDüsseldorf, Germany

Are you looking to be part of a global market leading company that is shaping the future of Energy Management and Automation? Schneider Electric is at the forefront of driving innovative and sustai... Mehr anzeigen

Site Reliability Engineer

deepset GmbHBerlin, Berlin, DE

You'll work across SaaS, private cloud, and on-prem environments to make our self-hosted platform production-ready, drive CI/CD and GitOps maturity, and reduce complexity at scale.Your work will di... Mehr anzeigen

Reliability Engineer (all genders)

SanofiFrankfurt am Main, Hessen, DE

Are you ready to jump into a mega project (>.B€ Capex program, thereof design/engineering budget >.The race is on to design & build our new Insulin flex facilities, addressing BLA (Biologics Licens... Mehr anzeigen

DBRE Site Reliability Engineer Databases (gn)

Schwarz DigitsHeilbronn, DE

Du bist ein Cloud Enthusiast oder willst einer werden.Technische Herausforderungen spornen Dich an.Erfahrung mit Datenbanken: Du hast Erfahrung mit relationalen oder NoSQL Datenbanken.Analytische F... Mehr anzeigen

 • Gesponsert

Senior Site Reliability Engineer (SRE)

TruphoneBerlin, Germany

Powered by a best-in-class telecom platform – including its own owned and operated global mobile core network, fully fledged in-house developed eSIM technology, and an extensive portfolio of teleco... Mehr anzeigen

Site Reliability Engineer (w/m/d)

PASSION4IT GmbHViechtach, Bayern, DE

Wir betreiben unsere eigenen Kubernetes-Cluster, deployen über GitOps, und unser fünfköpfiges Team liefert schneller als Unternehmen mit zehnmal so vielen Leuten.KI, die direkt in Microsoft Teams l... Mehr anzeigen

System Engineer/Site Reliability Engineer

Atruvia AGKarlsruhe, de

System Engineer/Site Reliability Engineer (m/w/d) | OMCOPA.Als deutschsprachiges Unternehmen nutzen wir zukunftsweisende Technologien wie künstliche Intelligenz oder Smart Data und schreiben Proze... Mehr anzeigen

 • Gesponsert

Reliability Engineer (w/m/d)

Quest OneAugsburg, Bayern, Deutschland
Quick Apply

Reliability Engineer (w/m/d) bei Quest One | softgarden.Alois-Senefelder-Allee 1, 86153 Augsburg.Senior Talent Acquisition Manager.Vorreiter in der PEM-Elektrolyse.Unser Ziel ist klar: Wir wollen m... Mehr anzeigen

Site Reliability Engineer

ZattooBerlin, Germany
Quick Apply

The ideal blend of stability and flexibility.A genuinely human employer that cares for people and the planet.True autonomy to shape what comes next, for us and you.This is the perfect platform to t... Mehr anzeigen

Senior Site Reliability Engineer

Stott and MayMunich

Senior Site Reliability Engineer - (m/f/d).We are a technology company providing large-scale digital media streaming services across multiple platforms and devices.We operate the full content deliv... Mehr anzeigen

Site Reliability Engineer (all genders)

envelioKöln, DE

Too easy is boring! Together, we are on a mission to drive forward the energy transition.We love what we do, and we are unafraid to dive in.We believe in taking ownership of our work and in continu... Mehr anzeigen

Senior Site Reliability Engineer (all genders)

TeamViewer GmbHGöppingen, EMEA, DE

Senior Site Reliability Engineer (all genders).TeamViewer seeks a Site Reliability Engineer to ensure the reliability, scalability, and performance of our Azure-based SaaS.Join a global backend tea... Mehr anzeigen

Doctoral Researcher in Data Quality & Sensor Reliability

Université du LuxembourgNeuerburg, DE

Faculty of Science, Technology and Medicine.FSTM) at the University of Luxembourg contributes multidisciplinary expertise in the fields of Mathematics, Physics, Engineering, Computer Science, Life ... Mehr anzeigen

 • Gesponsert

Safety and Reliability Engineer *

Tri.Merge GmbHFriedrichshafen, DE

Bei uns zählt der Mensch, nicht das Geschlecht.Wir setzen auf Vielfalt, lehnen Diskriminierung ab und denken nicht in Kategorien.Du kannst uns dabei unterstützen!.In deiner neuen Rolle übernimmst d... Mehr anzeigen

Diese Stelle ist in deinem Land nicht verfügbar.
AI Infrastructure & Reliability Engineer

AI Infrastructure & Reliability Engineer

HiBobBerlin, de
Vor 6 Tagen
Stellenbeschreibung
AI Infrastructure & Reliability Engineer

About Us

HiBob helps modern, mid-size businesses transform the way they manage people, giving HR and managers all they need to connect, engage, develop, and retain top talent. Since 2015, we’ve achieved consecutive triple-digit year-over-year growth, all backed by our amazing team of Bobbers from across the globe, making us the choice HRIS of over ~5500 midsize and multinational companies and over 1 Milion users.

Our HR platform is intuitive, data-driven, and built for the way people work today: globally, remotely, and collaboratively.

What this role is really about

You’ll join a 3-person platform team within our Business Technology group -owning the internal infrastructure that our AI platform and its users depend on. This isn’t a product engineering role, and it isn’t ticket work or babysitting pipelines someone else built. You’re building and operating the internal foundation that the company runs on. The work covers the full stack of platform engineering: core cloud infrastructure (AWS, Kubernetes, IaC), CI/CD pipelines, AI-driven infrastructure components, and the SRE and observability practice that keeps it all honest -metrics, alerting, incident response, and reliability standards. As our AI capabilities grow, so does the complexity underneath them, and staying ahead of that is central to the role. If you treat infrastructure as a product -reusable, automated, observable, and built to last -this is your kind of role.

Job requirements

  • 2-4 years Hands-on DevOps, SRE, or infrastructure engineering in production SaaS environments.
  • Strong AWS experience: multi-account architecture, cross-account IAM, serverless and event-driven services (Lambda, SQS, SNS, EventBridge), and EKS cluster management.
  • Proven Kubernetes experience in production, including cross-account migrations and stateful workload management.
  • Proficiency with Terraform - repository structure design, module architecture, and CI/CD pipeline implementation.
  • Hands-on experience building and maintaining GitHub Actions pipelines for end-to-end CI/CD workflows.
  • Working Python proficiency for scripting, internal tooling, and workflow automation.
  • Practical experience implementing observability stacks from scratch: metrics, logging, distributed tracing, and alerting.
  • Experience owning reliability practices: SLOs, incident response, and postmortem culture.

Nice to have

  • Hands-on experience operating LLM APIs in production: rate-limit and quota management, cost attribution per team/model, latency monitoring, and resilience patterns (retries, fallbacks, circuit breakers).
  • FinOps experience across cloud, AI, and observability spend.
  • Experience introducing self-healing or auto-remediation patterns in production.

Job responsibilities

  • DevOps & AI-Driven Infrastructure - own CI/CD, deployment processes, and release reliability. Build and operate cloud infrastructure that is automated, intelligent, and continuously self-improving - not just managed.
    • Design and build our Terraform repository and IaC pipeline from scratch -AI-assisted generation, drift detection, and policy enforcement built in.
    • Build AI-driven GitHub Actions pipelines -automated code review, risk assessment, and intelligent deployment decisions.
    • Manage Kubernetes workloads across AWS accounts -zero downtime, fully automated, nothing left behind.
  • Embed AI into the operational layer -proactive drift detection, automated remediation, and intelligent scaling toward a self-healing runtime.
  • Reliability & SRE -improve uptime, resilience, and incident response.
    • Define and enforce SLOs/SLIs, error budgets, and on-call practices.
    • Lead incident response, postmortems, and systemic reliability improvements.
  • Own AI-specific reliability: model latency SLOs, token quota monitoring, rate limit handling, fallback and retry strategies, and cost-per-request alerting.
  • Observability & Telemetry - increase visibility, reduce noise, improve troubleshooting.
  • Establish and continuously evolve the observability stack: metrics, logs, distributed tracing, and alerting tuned for both application and AI workloads.
  • AI / LLM Operations- bringing AI systems to production and operating them at scale, with a focus on reliability, performance, and trust.
    • Own the AI infrastructure layer: rate limits, quota management, latency SLOs, and fallback strategies (retries, circuit breakers).
  • Operate LLM APIs in production with resilience and cost attribution per team/model.
  • FinOps & Cost Optimization - optimize AI, infra, and logging costs at scale.
  • Build cost visibility and guardrails across AWS, LLM usage, and observability pipelines.
reliability engineer Jobs in Germany | Bewirb dich jetzt | Talent.com