We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results

Site Reliability Engineer - Platform

C3.ai, Inc.
United States, California, Redwood City
1300 Seaport Boulevard (Show on map)
Nov 21, 2024

C3.ai, Inc. (NYSE:AI) is a leading Enterprise AI software provider for accelerating digital transformation. The proven C3 AI Platform provides comprehensive services to build enterprise-scale AI applications more efficiently and cost-effectively than alternative approaches. The C3 AI Platform supports the value chain in any industry with prebuilt, configurable, high-value AI applications for reliability, fraud detection, sensor network health, supply network optimization, energy management, anti-money laundering, and customer engagement. Learn more at:C3 AI

We are seeking a highly skilledSite Reliability Engineer (SRE) to join our team to manage, monitor, and optimize our C3 clusters on Kubernetes. The ideal candidate will have a deep understanding ofKubernetes,Cloud Infrastructure, andInfrastructure as Code (IaC)practices. You will be responsible for ensuring the reliability, scalability of our Kubernetes clusters and Cloud Infrastructure

Responsibilities:



  • Monitor and Manage Kubernetes Clusters: Ensure the stability, health, and scalability ofKubernetes Clusters, deploying applications and services on Kubernetes.
  • Kubernetes Management: Deploy, monitor, and scale applications on Kubernetes clusters. Maintain Helm charts, manage services, and ensure resource allocation for optimal cluster performance.
  • Cloud Infrastructure Management: Work with leadingCloud Platforms (AWS, GCP, Azure)to set up, configure, and manage infrastructure resources usingInfrastructure as Code (Terraform, CloudFormation, etc.).
  • Monitoring & Incident Response: Set up monitoring solutions, define alerts, and manage the incident response process for any issues related to Jenkins, C3, or Kubernetes clusters.
  • Automate Infrastructure Processes: Build automation tools for scaling, monitoring, and maintaining infrastructure using modern tools likeTerraform, Ansible, or equivalent.
  • Collaborate Across Teams: Work closely with development, services, and operations teams to ensure a seamless integration between application development and infrastructure.
  • Security & Compliance: Ensure all systems follow best practices in terms of security and compliance with relevant regulations. This includes role-based access, encryption, and automated vulnerability scanning.


Qualifications:



  • 3+ years of experienceas an SRE, DevOps Engineer, or related role.
  • Hands-on experience with Kubernetesin production environments (managing clusters, deployments, services, and pods).
  • Proficiency in cloud platformslike AWS, GCP, or Azure, including managing infrastructure viaIaC toolslike Terraform, CloudFormation, or equivalent.
  • Familiarity withmonitoring toolslike Prometheus, Grafana or equivalent.
  • Experience with Helmand managing Kubernetes applications via Helm charts.
  • Strong scripting and automation skillsin languages like Bash, Python, or Groovy.
  • Experience withCI/CD tools, GitOps, and best practices for continuous integration and delivery pipelines.
  • Understanding ofnetworking conceptsandsecurity best practicesin a cloud-native environment.
  • Incident management experience, including setting up on-call rotations, managing runbooks, and post-incident reviews.

C3 AI provides excellent benefits, a competitive compensation package and generous equity plan.

California Pay Range
$129,000 $169,000 USD

C3 AI is proud to be an Equal Opportunity and Affirmative Action Employer. We do not discriminate on the basis of any legally protected characteristics, including disabled and veteran status.

Applied = 0

(web-5584d87848-7ccxh)