Site Reliability Engineer

Fortinet
United States, California, Sunnyvale
899 Kifer Road (Show on map)
Apr 02, 2026
At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess over getting the details right. We love what we do and are proud of our work to secure clouds and container environments for thousands of b2b customers worldwide. Our team is growing, and we are looking for engineers with passion for automation. You will help support the Lacework platform and play a key role in building, operating, and improving the Lacework Cloud Security Platform, the world's best real-time cloud-native threat detection system. Our team develops and supports the infrastructure layers spanning our cloud accounts, network/connectivity, workload management, observability, and storage services. We build tooling to perform automated operations in order to scale the Lacework infrastructure and service. To be successful you will design, define, develop, deploy and operate internal tooling, APIs, and frameworks which streamline our workflows and automate our infrastructure. The Role: Automate as much as reasonable to significantly improve operational efficiency of the Lacework platform Design, build and improve our infrastructure to enhance service scalability, resiliency, and efficiency across the company. Identify mission-critical problems and solve them via automation, tooling, communication, and informed design. Build and improve monitoring and instrumentation to predict future scalability or failure risks and solve them before they manifest into customer-facing issues. Facilitate company-wide visibility into key metrics, SLAs, and milestones so that scale and resiliency are a part of every conversation. Develop best practices alongside engineering/operations teams to improve the scalability and reliability of internal processes. Participate in an on-call rotation. Minimum Qualifications: 3 years of Devops/SRE experience with production systems (depending on level) Strong development and automation skills. Extensive experience with Infrastructure as Code (Terraform, etc), as well as supporting tooling (Atlantis, ArgoCD, etc) Extensive experience with Kubernetes and supporting tooling (Helm, operators, etc) Extensive experience with a variety of cloud managed services and providers AWS: EKS, EC2, S3, RDS, Secrets Manager, etc. Experience building production quality cloud infrastructure that enables reliable and rapid deployment of microservices with effective monitoring and built in high availability and/or fault tolerance. Strong passion for using automation to create simple repeatable dev and ops patterns that ensures a stable, reliable experience for customers. Strong cross-team communication skills. Experience with the building blocks of large-scale systems including load balancing, distributed/cloud computing, containers, instrumentation, and monitoring. Knowledge of cloud networking, including VPC configuration and cross-cloud connectivity. Familiarity with one or more programming languages (Python, Golang, etc.). Preferred Qualifications: Experience with monitoring and observability systems and tools (Prometheus, Grafana, New Relic, DataDog, etc.) Believe everything should be "as code" Experience with Java application servers and JVM configuration