Diese Stelle ist in deinem Land nicht verfügbar.

Site Reliability Engineer (f / m / d)

Virtual Minds GmbHKöln

Vor 30+ Tagen

Stellenbeschreibung

Introduction Sentence

Virtual Minds is a 100% subsidiary of ProSiebenSat.1 Media SE and stands for premium Adtech made in Europe for over 20 years. Whether SSP, DSP, DMP or adserving - as a first mover, we and our 200 employees always have the right solution for successful operations within the digital advertising market.

As a Site Reliability Engineer (SRE), you will play a crucial role in ensuring the reliability, availability, and performance of our Kubernetes platform and Software-as-a-Service (SaaS) applications deployed on Kubernetes. You will collaborate closely with our development, operations, and infrastructure teams to build, maintain, and optimize the systems that power our products. This position offers an excellent opportunity to work with cutting-edge technologies and contribute to the growth of a dynamic and innovative organization.

What you can expect in this role

Kubernetes Platform Management : Design, deploy, and manage our Kubernetes platform to support scalable and reliable application deployments. Monitor and maintain the platform's health, performance, and security.

SaaS Application Deployment : Oversee the deployment of our Software-as-a-Service applications on the Kubernetes platform. Implement best practices for application scalability, high availability, and disaster recovery.

Reliability and Availability : Implement robust monitoring, alerting, and logging systems to proactively identify and resolve potential issues. Ensure high system availability and quick incident response times.

Performance Optimization : Continuously optimize the Kubernetes infrastructure and SaaS applications to achieve maximum performance and efficiency. Conduct performance testing and tuning to meet or exceed service level objectives.

Incident Management : Participate in an on-call rotation to respond to incidents promptly and effectively. Conduct thorough post-incident reviews to identify root causes and implement preventive measures.

Automation and Tooling : Develop and maintain automation tools and scripts to streamline processes and improve the efficiency of operational tasks.

Security and Compliance : Implement security best practices for Kubernetes and SaaS applications. Collaborate with the security team to ensure compliance with industry standards and regulations.

Collaboration : Work closely with cross-functional teams, including development, infrastructure, and product management, to provide expertise and support throughout the software development lifecycle.

Continuous Improvement : Identify areas for improvement in the infrastructure, processes, and deployment methodologies. Propose and implement enhancements to increase system reliability and performance.

Your essential experience and education

Experience : Significant relevant experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role, with a strong focus on Kubernetes platform management and SaaS deployment.

Kubernetes Expertise : Proficiency in managing Kubernetes clusters and related tooling (e.g., Helm, kubectl, operators). Experience with container orchestration, service mesh, and Kubernetes networking.

Significant experience with AWS, especially services like EKS, MSK, RDS, S3, CloudTrail, CloudWatch, and deploying and managing the AWS infrastructure as code (Terraform & ArgoCD)

Programming and Scripting : Solid programming skills in languages such as Python or Go. Proficiency in scripting to automate tasks and develop tooling.

Monitoring and Logging : Experience with monitoring solutions (e.g., Prometheus, Grafana) and centralized logging platforms (e.g., ELK stack).

CI / CD : Knowledge of continuous integration and continuous deployment pipelines, preferably with tools like Jenkins, GitLab CI / CD, or Tekton.

Networking and Security : Understanding of networking concepts and security best practices in the context of Kubernetes and SaaS deployments.

Problem-Solving Skills : Strong analytical and problem-solving abilities to diagnose and resolve complex technical issues.

Collaboration and Communication : Excellent teamwork and communication skills to collaborate effectively with various teams and stakeholders.

Continuous Learning : A passion for staying up-to-date with the latest technologies, industry trends, and best practices in SRE and Kubernetes.

What's in it for you?

Leading full tech stack : We offer the opportunity to cooperate with the leading players in the industry in a team of top experts and to actively shape one of the most technologically exciting tech areas. You’ll work in a start-up atmosphere and still enjoy all the benefits of a large corporation
Attractive working environment : You decide : Remote, hybrid or on-site. Our modern working models, flexible working hours and 30 days of vacation contribute to a good work-life balance
Room for Growth : Everyone can get involved and contribute to advancing topics. We work at eye level, in agile dynamics and flat hierarchies. We offer a wide range of internal and external training opportunities for your personal and professional development
Many other benefits : Profit from additional benefits such as employee discounts, bicycle leasing (JobRad), Hansefit or Urban Sports Club and subsidised company pension schemes

Closure

Please send your application including your salary expectations and availability via our careers page. For further information, please contact us on +49761 / 88147-0.

Company Text

Click here to learn more about the ProSiebenSat.1 Group and our diverse portfolio.

You have a disability and would like to apply? Then you are very welcome.

We know that we are not entirely accessible yet, but we are working on it. Let's talk about how we can eliminate any barrier together and find an individual solution if needed.

Although we refer to one gender in the text, all genders may be implied.

Jobalert für diese Suche erstellen

Site Reliability Engineer • Köln