Location : Hybrid – Cologne (Rheinauhafen) — 3 days in the office, 2 remote (Tue + Thu)
Team : Engineering
Keep the world awake — build reliability at scale
ilert helps thousands of DevOps & IT teams detect, fix, and communicate incidents faster.
Our platform is mission-critical : customers rely on us 24 / 7 to keep their always-on businesses running.
As a Site Reliability Engineer at ilert, you’ll own the reliability, performance, and scalability of our core platform across AWS, Kubernetes, Kafka, and more.
Tasks
Build & operate a highly available platform
Run and evolve our AWS-based infrastructureOperate and optimize self-managed Kafka, ClickHouse clusters and our Observability stackEnsure resilience, disaster recovery, and capacity planning across the stackImprove reliability & performance
Build and maintain SLOs, SLIs, error budgets, and observability dashboardsDebug production issues across layers (networking, Kubernetes, application, DB)Improve performance of our ingestion pipelineAutomation & tooling
Automate operations with Terraform, Helm, Kubernetes operators, and internal toolingBuild tooling for safer deploys, blue / green rollouts, and automated verificationStrengthen incident response workflows through deep collaboration with our AI SRE agent teamSecurity & compliance
Implement best practices for workload isolation, secrets management, IAM, and auditabilitySupport our ISO27001 posture by automating controls and hardening our infrastructureCross-functional impact
Partner with Backend, AI, and Product teams to design reliable servicesParticipate in on-call rotationLead post-incident reviews and drive reliability improvements long-termRequirements
3+ years experience as SRE, Platform Engineer, DevOps Engineer, or Infrastructure EngineerStrong hands-on experience with AWS, Kubernetes, Linux internals, networking, performance tuningExperience operating self-managed distributed systems , ideally Kafka or ClickHouseStrong understanding of observabilityExperience automating infrastructure with Terraform and CI / CD systemsFluent English (our working language); German optionalBenefits
🚀 Product-centric - 100 % focused on solving a mission-critical pain felt by every always-on business |🏡 Hybrid freedom - 2 days remote by default; gorgeous Rheinauhafen roof terrace when you’re in town |🕒 Focus >meetings - We time-box syncs, favour async docs and protect maker time |
🌴 28 days off - …plus public holidays |🚲 Commute perks - subsidised public transport|ilert is a SaaS company for alerting, on-call management and status pages and helps companies to operate always-on services and respond faster to incidents.