Team : Product & Engineering Reports to the CTO
Location : Hybrid - Cologne (Rheinauhafen) - 3 days in office 2 days remote (Tue and Thu)
Shape the future of autonomous incident response
Were on a mission to make downtime invisible. Thousands of DevOps and SRE teams rely on ilert to detect resolve and communicate incidents faster.
As our first AI Product Engineer youll build the core of ilerts AI-first strategy : autonomous tool-using agents that diagnose alerts run root cause analysis execute safe mitigations and keep services healthy.
This is a hands-on role where youll turn operational expertise and product insight into real reliable AI systems used in production.
Tasks
Design & Build AI Agents
- Design agent reasoning loops prompts and safety constraints.
- Build multi-step tool-using agents (logs metrics traces k8s Git CI / CD cloud APIs).
- Implement autonomy flows : investigation analysis mitigation validation.
Ship Product Features
Work with product and engineering to build AI-backed features that solve real customer problems.Translate complex SRE workflows into intuitive user experiences powered by AI.Own features end-to-end (design prototype implementation rollout).Integrate with Observability & Ops Tooling
Connect LLM agents to Grafana Prometheus Kubernetes GitHub CI / CD cloud services etc.Design safe tool schemas and APIs for autonomous execution.Ensure Reliability Safety & Determinism
Build guardrails for safe reversible mitigations.Validate model output with structured schemas (e.g. Zod JSON schema).Establish evaluation suites test harnesses and monitoring for agent performance.Collaborate Across Teams
Work with SREs to encode operational expertise into agents.Partner with Product to shape requirements and roadmap decisions.Influence ilerts broader AI strategy.Requirements
Must-Have Skills
Experience building AI-powered applications with LLMs (OpenAI Anthropic etc.)Strong prompt engineering & agent design skillsExperience implementing multi-step tool-use flowsSolid software engineering fundamentals (preferably Rust)Experience integrating with APIs backend services or automationsAbility to reason about reliability safety and controlled automationProduct mindset : able to turn ambiguous problems into shippable solutionsNice-to-Have Skills
Background in SRE DevOps or incident responseExperience with observability tools (Grafana Prometheus Elastic Datadog New Relic)Hands-on Kubernetes knowledgeExperience with production agent frameworks (ReAct LangChain LangGraph custom state machines)Soft Skills
You love building real products not demosStrong communication & critical thinkingComfortable working with high autonomy and ownershipPassion for reliability automation and removing toilBenefits
Build one of the first real autonomous SRE agents in the industryProduct-centric culture : Be part of a team thats 100% committed to solving a critical issue for businesses that offer round-the-clock services.Hybrid Work Environment : Enjoy the best of both worlds with in-person collaboration and remote work flexibility.No Meetings #hackfwd : Maximize productivity by keeping meetings to a minimum and focusing on your core responsibilities.High impact high ownership role. Your work ships to customers quicklySmall senior team with fast decision-makingModern tech stack strong engineering cultureDirect involvement in shaping the future of on-call and incident responseFounder-led startupPlease include one link (GitHub repo notebook or demo) that best showcases your experience building AI-powered or agentic systems.
Key Skills
Continuous Integration,APIs,Automotive software,Test Cases,Electrical Engineering,Junit,Distributed Control Systems,Testng,Java,Test Automation,Programmable Logic Controllers,Selenium
Employment Type : Employee
Experience : years
Vacancy : 1