Sthenos AI is the AI developer of EFA Group, building intelligent, mission-ready solutions for defense and aerospace. With deep expertise in Command-and-Control (C2), cyber defense, computer vision, and autonomous systems, we design and deploy secure, field-proven AI that enhances operational efficiency and situational awareness. As part of a leading European defense ecosystem, we bring scalable innovation where it matters most — in the theater of operations.
As a Senior Infrastructure Engineer (f/m/d) you are a Day-1 engineer responsible for designing, implementing and maintaining the distributed infrastructure backbone for a customer of Sthenos AI.
You manage and implement a highly available, distributed and virtualized data center infrastructure that serves as the foundation for a next generation AI-based intelligence platform.
You design and implement the network architecture, compute, storage and virtualization layers of an on-premises environment to ensure scalability, performance and resilience.
You design and implement a Kubernetes-based platform for running distributed workloads, ensuring optimal integration with networking, storage and hardware resources.
You plan, deploy and operate virtualized infrastructure environments including compute clusters, hypervisors and software-defined networking.
You define and implement hardware configurations and infrastructure standards, including GPUs, networking equipment and storage systems.
You design high-performance networking architectures, including routing, switching, load balancing and secure interconnection between distributed infrastructure components.
You operate a variety of different databases and storage solutions and make sure that they run smoothly and according to the performance standards set by the platform.
You manage the different environments of the infrastructure and monitor continuously their capacity for taking proactive measures and ensuring their future growth.
You ensure the reliability, performance and security of the infrastructure by defining operational standards, monitoring systems and incident response processes.
You automate infrastructure provisioning and lifecycle management using Infrastructure as Code and configuration management tools.
You collaborate closely with the platform vendor, ML engineers and data scientists to ensure the infrastructure supports demanding AI workloads and distributed computing requirements.
You continuously evaluate and introduce new infrastructure technologies and best practices to improve efficiency, scalability and resilience.
You act as a technical authority and mentor for the infrastructure and platform engineering teams.
Your skills
Graduate degree in Computer Science, Informatics, Electrical Engineering or a related field.
Hands-on engineering mindset with a strong passion for designing, building and operating data center infrastructure.
At least 8–10 years of experience designing and operating complex infrastructure systems, preferably in high-performance or distributed environments.
Deep expertise in networking, including routing, switching, VLAN/VXLAN, firewalls, load balancing and software-defined networking.
Strong experience with virtualization technologies such as VMware or similar hypervisor platforms.
Extensive experience designing and operating Kubernetes clusters for large-scale distributed workloads.
Strong knowledge of Linux operating systems and system internals.
Solid understanding of server hardware architecture, including CPU, memory, storage and networking configurations.
Experience with storage systems such as distributed storage, SAN/NAS, and software-defined storage.
Strong experience with relational and NoSQL databases but also modern object stores.
Proven experience with automation and Infrastructure as Code, preferably using Terraform, Ansible or similar tools.
Experience with observability, monitoring and performance tuning of infrastructure systems.
Experience with DevOps best practices and the implementation of CI/CD pipelines.
Strong understanding of high availability, fault tolerance and disaster recovery strategies.
Your competencies
You have good analytical skills and you can split a complex problem down to its individual parts.
Team work is not alien to you and you like working together with highly-skilled engineers and scientists but are not afraid to have accountability and move forward with bold decisions when technical challenges arise.
You implement new ideas independently and show perseverance when it comes to defending agreed concepts.
Your communication skills allow you to conduct knowledge transfer of complicated technical topics in an easy manner to other audiences.
You bring perseverance and constructive assertiveness.
You can speak English fluently.
We Offer