Where

SRE Engineer Observability

Triune Infomatics Inc
San Jose Full-day Temporary

Description:

Role: SRE Engineer Observability Location: San Jose, CA Duration: 6+ months (possible extension) Key Responsibilities: Design, implement, and maintain end-to-end observability platforms using the Kubernetes + Prometheus Stack (Prometheus, Loki, Grafana, Alert Manager).Develop and optimize monitoring and alerting solutionsfor large-scale distributed systems on AWS.Automate observability workflows using Python & Go(e.g., custom exporters, Grafana dashboards).Integrate with PagerDutyfor incident ma
Aug 7, 2025;   from: dice.com

Similar jobs

  • Vings Technologies
  • San Jose
Description: Core Technical Skills: -Advanced SQL (Snowflake, Databricks): Table management, deprecation, data querying -Python: Scripting for automation, ETL workflows, alert tooling -Airflow: DAG creation, dependency management, alert tuning -Version ...
12 days ago
  • Fourways Consulting Services
  • San Jose
Description: Job Title: Site Reliability Engineer (SRE) Location: Research Triangle Park, NC / San Jose, CA Duration: 12 Month Contract Job Description: We are looking for an experienced SRE with the following qualifications: Strong experience working ...
18 days ago
Description: Position SummaryWe are looking for an AI Site Reliability Engineer to manage, optimize, and scale high-performance compute (HPC) and AI platforms including NVIDIA DGX and Cisco UCS. This role blends SRE principles, AI/ML operationalization, ...
18 days ago
  • Alphosoft Inc
  • San Jose
Description: Remote is fine. Prefer candidates who are in CA region but can work remotely. Job Description Design and implement AI Agents to optimize cloud resource allocation, auto-scaling, and performance tuning.Develop predictive models for failure ...
11 days ago