We're rebuilding climate technology from the ground up.
Senior Site Reliability Engineer, C#, .NET
Location
United States
Posted
5 days ago
Salary
$135K - $170K / year
Seniority
Senior
Job Description
Senior Site Reliability Engineer, C#, .NET
Climavision
• Own production reliability for Climavision’s customer-facing platform and radar-derived weather data services across Azure, colocation, and edge Kubernetes environments. • Contribute to the definition and improvement of SLIs, SLOs, alerting standards, and operational metrics used to measure platform reliability. • Support and coordinate production incident response efforts, including troubleshooting, mitigation, communication, and postmortem analysis. • Diagnose and resolve complex production issues across application services, Kubernetes infrastructure, storage, and distributed systems. • Drive multi-replica and multi-cluster high availability across Climavision’s .NET services. • Improve reliability and operational maturity of production platform services, including observability, autoscaling, ingress, and distributed storage. • Partner with software engineering teams to improve production readiness, resiliency patterns, deployment safety, and operational visibility before services reach production. • Support and evolve Climavision’s observability platform, including metrics, logging, distributed tracing, dashboarding, and alerting.
Job Requirements
- A bachelor’s degree in computer science, software engineering, or a related field; equivalent professional experience considered.
- Minimum of 7 years of experience in Site Reliability Engineering, DevOps, Production Engineering, Platform Engineering, or a related infrastructure-focused role, with at least 4 years in a role formally titled Site Reliability Engineer or carrying explicit SLO / error-budget accountability.
- Strong, hands-on software engineering experience with a minimum of 3 years of experience supporting and modifying C# / .NET applications in production environments.
- Demonstrated experience refactoring production application code (preferably C# / .NET) to make services horizontally scalable across multiple replicas.
- Experience designing or operating multi-cluster high-availability architectures, including failover behavior, traffic routing, and cross-cluster service deployment.
- Strong hands-on experience operating production workloads in self-managed or highly customized Kubernetes environments.
- Experience diagnosing and resolving production incidents across application, platform and Kubernetes infrastructure layers, including workload scheduling, storage, ingress, and cluster-level failures.
- Strong written and verbal communication skills, including incident documentation and postmortem authoring.
Benefits
- Competitive compensation
- Comprehensive benefits package
- 401(k) Savings Plan
- Medical/Dental/Vision Benefits
- Health Savings Account (HSA) and Flexible Spending Account (FSA)
- Unlimited Paid Time-off
- 11 Paid Holidays
- Paid Parental Leave
- Company Paid Short-term Disability (STD)
- Company Paid Long-term Disability (LTD)
- Company Paid Life Insurance
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Role Description We're looking for a Middle SRE to join our team and help us build and maintain reliable, observable, and scalable infrastructure. You'll work closely with developers, own reliability practices, and contribute to the team's DevOps culture. Qualifications - Linux — confident working in the terminal, troubleshooting, system basics - Docker — understanding of containers, images, and basic Docker operations - GCP (Google Cloud Platform) — hands-on experience required; our entire infrastructure runs on GCP - Kubernetes — solid hands-on experience; understanding of workloads, networking, and cluster operations - GitOps / FluxCD — experience with FluxCD or similar GitOps tooling is a strong advantage - Observability — Prometheus, Alertmanager, Grafana; understanding the difference between metrics, logs, and traces - Git / GitHub — comfortable with branching strategies, PRs, and Git-based workflows - Knows what SLI, SLO, and SLA mean. - Basic understanding of AI-related concepts: agents, skills, MCP (Model Context Protocol) Requirements - CI/CD Pipelines — experience writing or maintaining pipelines (GitHub Actions, GitLab CI, or similar) - Databases — familiarity with PostgreSQL, Couchbase, or other databases; understanding of basic operations and monitoring - Networking fundamentals — knows where load balancers, gateways, and DNS fit in a cloud architecture - Infrastructure as Code — Terraform or similar IaC tooling Responsibilities - Managing and improving Kubernetes-based infrastructure - Maintaining and evolving observability stack (metrics, logs, traces) - Writing and improving CI/CD pipelines - Supporting and improving GitOps workflows - Collaborating with developers on reliability, performance, and scalability topics
Sr. DevOps Support Engineer
BuildkiteBuildkite is the fastest, most reliable way to deploy and test code at any scale.
Role Description This is not your average Support Engineer role. You'll support Buildkite's enterprise customers by independently troubleshooting and resolving highly complex CI/CD, software, and infrastructure problems, collaborating directly with Software Engineering teams at places like Canva, Uber, and Airbnb to help unblock critical workflows. You'll play a vital role in shaping the support team's evolution into a function that balances reactive and proactive work, spending half your time designing and implementing scalable tools, processes, and open source contributions that create a world-class customer experience. And you'll do this while enjoying Buildkite's commitment to true work-life balance, flexible hours, and no on-call. This role is a perfect opportunity for a Software, DevOps, or Infrastructure expert who is customer obsessed and thrives on solving tough technical problems for some of the world's most innovative engineering teams. This role is remote and requires someone to be located in the US, Pacific Standard Time. What You'll Do - Independently resolve highly technical CI/CD, software, and infrastructure issues for enterprise customers - Provide support through Slack, Zoom, Email, Plain, and Community Forum - Lead customer-facing retrospectives and planning sessions to improve their Buildkite experience - Identify opportunities to prevent future support issues and drive the design and implementation of proactive solutions - Contribute to and maintain open source tools such as Bash plugins and Golang-based utilities - Publish documentation updates and proactive communications that improve the customer journey - Act as a voice for customers, ensuring their feedback shapes internal product and process improvements - Own and improve support workflows, simplifying how the team operates and delivers - Share learnings from customer work to strengthen alignment with Product and Engineering - Mentor peers and foster an inclusive, supportive environment within the team Qualifications - Customer Centric Mindset – Deep commitment to delivering world-class support with empathy and patience - Technical Expertise – Strong coding or scripting ability with experience in Bash, Ruby on Rails, or Golang - DevOps and Cloud Knowledge – Solid experience with CI/CD tools and platforms including Linux, AWS, GCP, Azure, Terraform, and Kubernetes - Problem Solving Skills – Confident in independently tackling complex technical issues and guiding customers to solutions - Autonomous and Proactive – Able to work independently, identify opportunities early, and take initiative to drive meaningful improvements - Excellent Communication – Skilled at articulating technical detail clearly to customers and internal teams across distributed environments Benefits - Competitive compensation, including salary, equity, and benefits package - Flexible, remote-first culture - Opportunities for professional growth and advancement - Help shape a proactive, world-class support function for enterprise customers - An inclusive, innovative culture where your ideas make a real impact Equal Opportunity Employer At Buildkite, we value diversity and celebrate all types of skills, backgrounds, and experiences. We're dedicated to fostering an inclusive environment and providing reasonable accommodations throughout our recruitment process. If you need any accommodations or support during the application or interview process, please reach out to us at accommodations@buildkite.com.
DevOps Engineer – Systems Operations, IT Security
Kooku Recruiting GmbH - Interim Recruiting & RPOPersonalberatung für digitales Recruiting und Interim HR Management
• You build and maintain our CI/CD pipelines and automate deployments and infrastructure (Infrastructure as Code, e.g., Terraform, Ansible). • You operate and monitor our production systems and ensure observability — you notice when something is off before it becomes a problem. • You work closely with development teams to make release processes faster and more reliable. • You contribute to incident response and ensure that our security and compliance requirements (ISO 27001, BSI C5, GDPR) are embedded in everyday operations.
• Instrument systems scheduling and executing large-scale batch workloads across Kubernetes clusters. • Diagnose and triage job failures for customers. • Collaborate with teams across the company to understand workload requirements and improve platform capabilities. • Scale the reliability and velocity of our systems and processes through increased automation. • Document actions to build a comprehensive library of runbooks, which will act as a knowledge base and foundation for automation. • Participate in an on-call rotation to uphold the SLOs and SLAs of production services. • Contribute to platform tooling, automation, and CI/CD workflows.



