Talent.com
Site Reliability Engineer (SRE) Team Lead
Site Reliability Engineer (SRE) Team Lead1GLOBAL • Berlin, BE, DE
Site Reliability Engineer (SRE) Team Lead

Site Reliability Engineer (SRE) Team Lead

1GLOBAL • Berlin, BE, DE
Vor 30+ Tagen
Anstellungsart
  • Quick Apply
Stellenbeschreibung

1GLOBAL is a technology-driven global mobile communications provider dedicated to empowering enterprises worldwide to unlock the full growth potential of mobile connectivity. With a best-in-class telecom technology platform, a comprehensive suite of globally viable regulatory licenses, and privileged access to the telecom wholesale market, 1GLOBAL is uniquely positioned to deliver seamless compliance and connectivity solutions. Serving the world’s leading banks, corporations, and digital-first businesses—including neo-banks, travel companies, and payment service providers—1GLOBAL connects over 43 million devices globally.

With 2024 full-year revenue exceeding US$100 million and in line to exceed US$200 million in FY25, 1GLOBAL is a profitable business generating significant cash flows to fund its ongoing investments in infrastructure, transformation, and growth. 2024 saw major client wins and marked 1GLOBAL’s evolution from a multi-market telecommunication provider to a global technology-driven mobile connectivity powerhouse.

Established in 2022 by experienced tech founders and entrepreneurs Hakan Koç and Pyrros Koussios, 1GLOBAL is a European technology leader driving digital transformation in the global telecommunications market. It operates as a fully regulated Mobile Virtual Network Operator (“MVNO”) in ten countries and as a regulated telecommunications operator in an additional 31 countries. Headquartered in the Netherlands, with world-class R&D hubs in Lisbon, Berlin, and São Paulo, 1GLOBAL employs over 450 experts across 15 countries.

Position Overview

We are looking for a talented Site Reliability Engineering (SRE) Team Lead to join our Technology Department.

We are open to hiring this role in Berlin, Germany.

As the SRE Team Lead, you will be responsible for ensuring the stability, scalability, and reliability of our global infrastructure and services across both cloud and on-prem environments.

You will lead a team of SREs focused on service availability, resilience, and operational excellence, driving a data-driven reliability culture based on SLIs, SLOs, and error budgets.

Your mission will be to proactively identify weaknesses across systems and improve reliability through redundancy testing, automation, and observability.

You will build tools and processes to automatically detect, prevent, and recover from incidents — ensuring our services remain reliable and performant for customers around the world.

This role collaborates closely with DevOps, Infrastructure, IP Network, and Security teams to maintain carrier-grade reliability standards across all layers of our platform.

Key Responsibilities

  • Lead and mentor a team of Site Reliability Engineers, setting clear priorities, goals, and reliability metrics.
  • Define, measure, and maintain SLIs and SLOs for core infrastructure and customer-facing services.
  • Plan and execute redundancy and resilience testing across service, infrastructure, and networking layers — validating failover, HA configurations, and disaster recovery readiness.
  • Design and implement automated recovery mechanisms , self-healing workflows, and intelligent alerting systems.
  • Drive incident response, root-cause analysis, and blameless post-mortems , and ensure implementation and tracking of corrective and preventive actions derived from them to achieve continuous improvement.
  • Develop and enhance observability (metrics, logs, traces) using Prometheus, Grafana, Loki, and OpenTelemetry.
  • Collaborate with Infrastructure and DevOps teams to ensure deployment safety, rollback policies, and configuration consistency.
  • Proactively identify weaknesses through fault-injection, load, and chaos testing .
  • Continuously reduce operational toil through automation and reliability tooling.
  • Establish on-call practices, improving alert quality, runbooks, escalation procedures and incident management processes.
  • Conduct capacity planning, performance benchmarking, and resilience audits across systems.
  • Ensure compliance with security, reliability, and availability standards.
  • Create and maintain internal documentation, playbooks, and operational guidelines for peers and users.
  • Built and managed cloud cost-optimization frameworks, including reserved capacity planning, autoscaling design, storage tiering, workload right-sizing, and continuous anomaly detection.

Requirements

Must Have

  • A minimum of 7 years of experience in Site Reliability, Systems, or Infrastructure Engineering (including 2+ years in a SRE role and 2+ years in a leadership role).
  • Strong expertise in Linux systems engineering, distributed systems, and networking.
  • Proven experience building and running high-availability, mission-critical production systems.
  • Hands-on experience with redundancy and failover testing, disaster recovery, and high-availability architecture validation.
  • Deep understanding of monitoring, observability, and incident management principles.
  • Experience with Prometheus, Grafana, Loki, Thanos, and OpenTelemetry or similar tools.
  • Proficiency in Python, Go, and Bash for automation and reliability tooling.
  • Strong knowledge of Kubernetes, container orchestration, and service mesh architectures.
  • Experience with AWS (EKS, EC2, VPC) and on-premises infrastructure integration.
  • Proficiency in Infrastructure as Code tools such as Terraform.
  • Understanding of networking fundamentals (routing, load balancing, BGP, DNS, VXLAN, etc.).
  • Excellent analytical and problem-solving skills, capable of leading under pressure.
  • Strong communication and collaboration skills across distributed and cross-functional teams.
  • Nice to Have

  • Experience in telecom, carrier-grade, or large-scale distributed systems environments.
  • Hands-on experience with chaos engineering and automated failure-scenario validation (e.g., simulating link or node failures).
  • Strong understanding of high-availability networking concepts.
  • Background in capacity planning, traffic engineering, and multi-region failover.
  • Experience building reliability dashboards and integrating SRE metrics into business KPIs or compliance reports.
  • Familiarity with security and resilience standards (ISO 27001, NIST SP 800-53).
  • Benefits

    Why 1GLOBAL?

  • Growth Opportunities :  Advance your career in one of the fastest growing telecommunications companies, expanding over 100% year-on-year under the leadership of successful tech entrepreneurs.
  • Major Transaction Exposure :  Be in the driver’s seat for transactions that will have an impact on the future telco industry.
  • Work with a Talented Team :  From the Board and the Founders to the Senior Management Team, you will collaborate daily with the most capable and renowned external advisors, and constantly being exposed to talented and driven individuals.
  • Dynamic Work Environment :  Thrive in a collaborative, fast-paced workplace where innovation is encouraged, and every contribution counts.
  • Professional Development :  Work alongside industry experts to enhance your skills and knowledge in a cutting-edge field.
  • International Experience :  Gain opportunities to work in different 1GLOBAL offices around the world as you grow within the company.
  • Open Communication Culture :  Join a team where your ideas are heard, and open dialogue is encouraged, fostering a supportive and transparent work environment.
  • Get Things Done Attitude :  Be part of a results-driven team that values efficiency, creativity, and the drive to make a tangible impact in the industry.
  • 1GLOBAL is an equal opportunity employer, we value your character as much as your talent. Diversity drives our innovation, and we offer a collaborative, dynamic, and international work environment. We are excited for you to join our mission to revolutionise connectivity globally.

    Jobalert für diese Suche erstellen

    Site Reliability Engineer • Berlin, BE, DE

    Ähnliche Stellen
    Data Engineering Tech Lead / Architekten

    Data Engineering Tech Lead / Architekten

    Andersen Lab • Ludwigsfelde, Germany
    Erfahrung als Data Engineering Tech Lead / Architekt ab 6 Jahren.Fhigkeit zur Konzeption komplexer Systeme, nicht nur zur Erstellung von Reports. Expertenkenntnisse in MicroStrategy / Power BI (Dash...Mehr anzeigen
    Zuletzt aktualisiert: vor über 30 Tagen • Gesponsert
    Staff Backend Engineer (Technical Lead Responsibilities - Multi-Domain)

    Staff Backend Engineer (Technical Lead Responsibilities - Multi-Domain)

    Qdrant • Berlin, Germany
    Homeoffice
    Quick Apply
    Qdrant is a fully remote, cutting-edge technology company building the next generation of AI-native search infrastructure through our open-source vector database and managed Cloud offering.Our plat...Mehr anzeigen
    Zuletzt aktualisiert: vor 27 Tagen
    Learning Content Engineer - Cloud / DevOps (m / w / d)

    Learning Content Engineer - Cloud / DevOps (m / w / d)

    StackFuel GmbH • Ludwigsfelde, Germany
    Abgeschlossenes Studium im MINT-Bereich oder vergleichbare technische Qualifikation.Erfahrung in der didaktischen Aufbereitung und Erstellung technischer Lerninhalte, idealerweise im Bereich Cloud,...Mehr anzeigen
    Zuletzt aktualisiert: vor über 30 Tagen • Gesponsert
    Lead Software Engineer (m / w / d) - C# & Go Transition (m / w / d)

    Lead Software Engineer (m / w / d) - C# & Go Transition (m / w / d)

    Schwarz IT • Berlin, DE
    Die Schwarz IT betreut die gesamte digitale Infrastruktur und alle Softwarelösungen der Unternehmen der Schwarz Gruppe.Sie ist somit für Auswahl, Bereitstellung und Betrieb sowie Weiterentwicklung ...Mehr anzeigen
    Zuletzt aktualisiert: vor 26 Tagen • Gesponsert
    Lead Software Engineer (gn) - C# & Go Transition (gn)

    Lead Software Engineer (gn) - C# & Go Transition (gn)

    Schwarz IT • Berlin, DE
    Die Schwarz IT betreut die gesamte digitale Infrastruktur und alle Softwarelösungen der Unternehmen der Schwarz Gruppe.Sie ist somit für Auswahl, Bereitstellung und Betrieb sowie Weiterentwicklung ...Mehr anzeigen
    Zuletzt aktualisiert: vor 26 Tagen • Gesponsert
    Senior Consultant SAP Planning & Analytics (m / w / d) - Berlin

    Senior Consultant SAP Planning & Analytics (m / w / d) - Berlin

    INFOMOTION GmbH • Ludwigsfelde, Germany
    Die SAP Analytics Cloud (SAC) und SAP Datasphere sind Dein Ding? Dann bieten wir Dir gerne die Chance, moderne Planungs- und Reportinglsungen mit SAP Analytics Cloud und SAP Datasphere zu konzipier...Mehr anzeigen
    Zuletzt aktualisiert: vor über 30 Tagen • Gesponsert
    Site Reliability Engineer (all genders)

    Site Reliability Engineer (all genders)

    PRODYNA SE • Frankfurt am Main, Berlin, Germany
    Bist du auf der Suche nach einem Arbeitsplatz, an dem positive Einstellung und Teamgeist groß geschrieben werden? Wir validieren und implementieren neue Technologien und entwickeln maßgeschneiderte...Mehr anzeigen
    Zuletzt aktualisiert: vor 1 Tag • Gesponsert
    Consultant – Cost Engineering Railway (m / w / d)

    Consultant – Cost Engineering Railway (m / w / d)

    quo connect management consulting GmbH • Berlin, bundesweit
    Das Assetmanagement führender Mobilitätsdienstleister ist stark kostengetrieben.Wenn Sie Zahlentalent mitbringen und im technischen Umfeld einen Beitrag für mehr Effizienz und Nachhaltigkeit leiste...Mehr anzeigen
    Zuletzt aktualisiert: vor 1 Tag • Gesponsert
    Site Reliability Engineer (w / m / d)

    Site Reliability Engineer (w / m / d)

    IONOS SE • Berlin, Germany
    Quick Apply
    Bei IONOS arbeitest Du bei dem führenden europäischen Anbieter von Cloud-Infrastruktur, Cloud-Services und Hosting-Dienstleistungen partnerschaftlich mit unterschiedlichen Teams zusammen.Wir bieten...Mehr anzeigen
    Zuletzt aktualisiert: vor über 30 Tagen
    Senior SRE - Database Operations

    Senior SRE - Database Operations

    N26 GmbH • Berlin
    We are seeking a Senior Site Reliability Engineer to join the Database Platform Team within the Platform Engineering Domain. Platform Engineering's mission is to provide trusted, performant, and sel...Mehr anzeigen
    Zuletzt aktualisiert: vor 27 Tagen • Gesponsert
    Lead Software Engineer (m / w / x) - C# & Go Transition (m / w / x)

    Lead Software Engineer (m / w / x) - C# & Go Transition (m / w / x)

    Schwarz IT • Berlin, DE
    Die Schwarz IT betreut die gesamte digitale Infrastruktur und alle Softwarelösungen der Unternehmen der Schwarz Gruppe.Sie ist somit für Auswahl, Bereitstellung und Betrieb sowie Weiterentwicklung ...Mehr anzeigen
    Zuletzt aktualisiert: vor 26 Tagen • Gesponsert
    Senior Site Reliability Engineering - Database Operations

    Senior Site Reliability Engineering - Database Operations

    N26 GmbH • Berlin, DE
    We are seeking a Senior Site Reliability Engineer to join the Database Platform Team within the Platform Engineering Domain. Platform Engineering's mission is to provide trusted, performant, and sel...Mehr anzeigen
    Zuletzt aktualisiert: vor 1 Tag • Gesponsert
    Team Lead - IT Platform & Application Engineering (m / f / x)

    Team Lead - IT Platform & Application Engineering (m / f / x)

    Nitrado (marbis GmbH) • Berlin, Germany
    Homeoffice
    Quick Apply
    We are looking for an experienced leader with a strong technical background to lead our IT Platform & Application Engineering teams. This role combines hands-on technical involvement, strategic...Mehr anzeigen
    Zuletzt aktualisiert: vor 5 Tagen
    Teamleiter (m / w / d) Guidewire Claims Center

    Teamleiter (m / w / d) Guidewire Claims Center

    Verti Versicherung AG • Teltow bei Berlin
    Aktuell arbeiten wir an einer Vielzahl spannender IT-Projekte und planen bereits weitere innovative Vorhaben.Deine Aufgabe wird es sein, das IT-Team zu koordinieren, die Projektumsetzung zu planen ...Mehr anzeigen
    Zuletzt aktualisiert: vor 15 Stunden • Gesponsert • Neu!
    Team Lead Process & Vulnerability Management (m / w / d)

    Team Lead Process & Vulnerability Management (m / w / d)

    operational services GmbH & Co. KG • Frankfurt am Main, Berlin, Germany
    ICT Service Provider im deutschen Markt und gilt als Backbone der Digitalisierung es Mittelstands.Sie ist die federführende, agile Einheit der Telekom Gruppe, um im deutschen Mittelstand die digita...Mehr anzeigen
    Zuletzt aktualisiert: vor 1 Tag • Gesponsert
    Teamlead Rope Access Offshore (m / f / d)

    Teamlead Rope Access Offshore (m / f / d)

    RTS Wind AG • Berlin, Deutschland
    With extensive experience, solid know-how, and a dedicated team, we make an important contribution to the energy transition every day. For offshore deployment, we are looking for a.Teamlead Rope Acc...Mehr anzeigen
    Zuletzt aktualisiert: vor 21 Tagen
    Learning Content Engineer - KI (m / w / d)

    Learning Content Engineer - KI (m / w / d)

    StackFuel GmbH • Ludwigsfelde, Germany
    Abgeschlossenes Studium im MINT-Bereich oder vergleichbare technische Qualifikation.Erfahrung in der didaktischen Aufbereitung und Erstellung technischer Lerninhalte, idealerweise im Bereich KI, Da...Mehr anzeigen
    Zuletzt aktualisiert: vor über 30 Tagen • Gesponsert
    Senior Engineer Data, AI & Analytics (m / w / d)

    Senior Engineer Data, AI & Analytics (m / w / d)

    Purpose Green • Ludwigsfelde, Germany
    Strong communication and stakeholder management in both English and German (written + spoken) - min.Production-level experience in Python for data and AI engineering (pipelines, APIs, orchestration...Mehr anzeigen
    Zuletzt aktualisiert: vor über 30 Tagen • Gesponsert