Job Closed

This listing is no longer active.

CADRE GOVERNMENT SOLUTIONS

We’re more than a government contracting company. We’re a cadre, a specialized team built to support government agencies with practical, reliable solutions to complex operational and technology challenges. We strategically mobilize, manage, and maintain specialized cadres using our RO(M³) model. CADRE GOVERNMENT SOLUTIONS is an Equal Opportunity and Affirmative Action Employer. We welcome and encourage diversity in our workforce. It is the policy of CADRE GOVERNMENT SOLUTIONS to provide equal employment opportunity to all employees and qualified applicants without regard to race, color, religion, national origin, sex, age, disability, pregnancy, sexual orientation, gender identity, genetic information, protected veteran status, or any other protected characteristic under federal, state, or local law.

Site Reliability Engineer

Location

United States

Posted

31 days ago

Salary

0

Seniority

Mid Level

No structured requirement data.

Job Description

Site Reliability Engineer

CADRE GOVERNMENT SOLUTIONS

Role Description The Site Reliability Engineer will support a large federal technology modernization effort focused on improving the reliability, visibility, and performance of cloud-native applications and services across a national benefits platform. This role focuses heavily on observability, telemetry, monitoring, and performance engineering within a modern serverless environment. You’ll work closely with development, operations, and platform teams to help build the standards, tooling, and engineering patterns used across serverless services. This role goes beyond writing Lambda code. You’ll help define how services are instrumented, deployed, monitored, and optimized across the platform. What You’ll Do: - Build and maintain observability, telemetry, logging, and monitoring solutions for serverless applications and services - Support performance analysis, troubleshooting, and optimization efforts across distributed cloud environments - Develop and maintain engineering patterns and standards for AWS Lambda services - Implement instrumentation using AWS Distro for OpenTelemetry (ADOT) - Support monitoring and alerting capabilities using Dynatrace, Splunk, and related observability tools - Work with development and DevOps teams to integrate monitoring and telemetry into CI/CD pipelines - Assist with diagnosing system bottlenecks, latency issues, and application performance concerns - Support automated testing, deployment, and operational readiness activities - Help improve operational visibility, tracing, and logging consistency across environments - Participate in Agile delivery activities, release coordination, and operational support efforts Qualifications - 3+ years of experience supporting cloud-native applications, performance engineering, or observability platforms - Experience with AWS serverless technologies including Lambda, CloudWatch, API Gateway, and related services - Experience with observability and monitoring platforms such as Dynatrace, Splunk, Grafana, or similar tools - Familiarity with OpenTelemetry or AWS ADOT instrumentation practices - Experience supporting CI/CD pipelines and automated deployment workflows - Understanding of distributed systems, logging, tracing, and performance analysis concepts - Experience troubleshooting performance issues across cloud environments - Familiarity with scripting or development languages such as Python, JavaScript, TypeScript, or Java - Understanding of DevOps and Agile software delivery practices - Strong communication and collaboration skills - Federal government experience is a plus - Ability to obtain and maintain a Public Trust or other required government clearance Benefits - CADRE Cares Program aimed at enhancing overall well-being and job satisfaction - CADRE Convoy Program: A dedicated support system designed to cater to individual needs - CADRE Connect Program: Initiatives that promote open communication and encourage growth - CADRE Compensation Program: - 401(k) Safe Harbor Plans with Matching & Immediate Vesting - Medical, Dental, & Vision Plans - Paid Time Off: Holidays, Vacation, Wellness, & Personal Leave Plans - Continuing Education & Training Budget - Office & Technology Budget - Cell Phone Budget - Wellness & Healthy Living Budget - Awards & Bonuses - Profit Sharing Plans Company Description We’re more than a government contracting company. We’re a cadre, a specialized team built to support government agencies with practical, reliable solutions to complex operational and technology challenges. We strategically mobilize, manage, and maintain specialized cadres using our RO(M³) model. CADRE GOVERNMENT SOLUTIONS is an Equal Opportunity and Affirmative Action Employer. We welcome and encourage diversity in our workforce. It is the policy of CADRE GOVERNMENT SOLUTIONS to provide equal employment opportunity to all employees and qualified applicants without regard to race, color, religion, national origin, sex, age, disability, pregnancy, sexual orientation, gender identity, genetic information, protected veteran status, or any other protected characteristic under federal, state, or local law.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Jusbrasil logo

SRE Partner – Affirmative Action Position for Persons with Disabilities

Jusbrasil

💻 Descomplicamos o acesso à informação jurídica por meio da tecnologia

DevOps Engineer31 days ago
Full TimeRemoteTeam 201-500H1B No Sponsor

• Ensure reliability, availability and scalability of systems and services in the Product Areas (PAs) where assigned. • Develop and implement monitoring, observability and alerting solutions integrated with the Agentic Engineering Platform. • Support teams in defining and tracking SLIs, SLOs and error budgets. • Structure and evolve on-call management in the PAs: rotation, escalation, alerting tools and incident management. • Work closely with the Engineering Platform to ensure platform capabilities reach and are adopted by product teams. • Actively contribute to the evolution of the Agentic Engineering Platform by bringing real feedback from PAs about friction points, gaps and opportunities for improvement. • Participate in and influence the building of a reliability-oriented (SRE) engineering culture across the company. • Support migrations of critical systems, environment segregation and deprecation of legacy technologies.

Brazil
Alpaca logo

Operations Reliability Engineer - Automations

Alpaca

Developer APIs for stocks and crypto trading, investing apps, and embedded fintech.

DevOps Engineer31 days ago
Full TimeRemoteTeam 201-500H1B No Sponsor

Role Description As an Operations Reliability Engineer , you will embed directly within brokerage operations functions to systematically eliminate manual work and replace it with durable, auditable software systems. You start by immersing yourself in operational workflows: observing, documenting, and deeply understanding processes end-to-end before designing solutions. Every recurring manual process is treated as a system defect, and every fix you ship is measured by its real-world impact on efficiency and reliability. You will work closely with licensed brokerage staff, domain experts, and platform engineers to build automations and tooling that allow Alpaca's operations to scale globally without scaling headcount linearly. The ideal candidate is equally comfortable shadowing an operational process and architecting the backend service that replaces it. Things You Get To Do - Design, build, test, deploy, and monitor production automations and UIs that remove manual steps and reduce operation time. - Partner with frontend engineers to productize ops tooling so global teams can run functions with predictable staffing. - Execute operational procedures to surface painful manual processes prior to automation. - Instrument and report baseline and outcome metrics (MTTC, manual-steps removed, queue sizes, ops satisfaction) and iterate based on measured impact. - Produce Platform Opportunity Briefs / RFCs for higher-level platform tooling and automations. - Collaborate with licensed BD leadership, Compliance, and Security to build auditable, safe automations with role-based access and clear runbooks. - Own the full lifecycle of the systems you build, including automated deployment (CI/CD with tools like ArgoCD and Terraform), proactive monitoring, On-call support rotations and incident response, following a "you build it, you run it" philosophy. - Build systems with auditability, traceability, and data lineage as a first-class concern to ensure transparency for our auditors and regulators. Qualifications - 5+ years of professional software engineering experience, with a proven track record of shipping and operating complex, large-scale systems in production. - Strong business sense and understanding of operations. - Deep, hands-on expertise in Golang, including a strong command of its concurrency models (goroutines, channels), memory management, and standard library. - Proven track record of building user-facing features end-to-end with Typescript/React. - Proficient with SQL and relational databases, preferably PostgreSQL. - Demonstrated ability to reason about human workflows as systems, not just software services. - Experience with observability, tracing, continuous profiling. - Exceptional analytical and problem-solving skills, with the ability to deconstruct complex requirements into clear technical components and excellent communication skills for working in a cross-functional environment. - High ownership mindset with bias toward durable, structural fixes over tactical patches. Requirements - Knowledge of service oriented architectures. - Experience with major cloud platforms (we primarily use GCP). - Financial market (exchange, broker-dealers, clearing, etc.) knowledge. - Experience with Docker and Kubernetes. - A passion for financial markets or the desire to learn. - Knowledge of Agile/Scrum methodologies. - Demonstrable experience in designing, building, and reasoning about distributed systems, including a strong understanding of microservices architecture and API design patterns (e.g., REST, gRPC). - Experience with capacity planning and benchmarking. Benefits - Competitive Salary & Stock Options. - Health Benefits. - New Hire Home-Office Setup: One-time USD $500. - Monthly Stipend: USD $150 per month via a Brex Card.

Worldwide
Full TimeRemoteTeam 1,001-5,000H1B No Sponsor

• Collaborate closely with the development team to deploy and maintain application infrastructure. • Assist in the development and support of tooling to streamline the deployment and maintenance of our products. • Work with Kubernetes, Docker, Helm and ArgoCD to deploy applications from development through to production environments. • Support both in-house and third-party applications, including handling deployments, upgrades, and troubleshooting. • Write and manage automation pipelines for application deployment and maintenance. • Provision and manage infrastructure using Terraform. • Document processes and best practices clearly and concisely.

United Kingdom
PlayOn! Sports logo

Senior Site Reliability Engineer

PlayOn! Sports

The nation's leading high school media company providing live streaming and digital ticketing services.

DevOps Engineer31 days ago
Full TimeRemoteTeam 201-500H1B No Sponsor

• Contribute to system observability i.e implementing, improving metrics, alerting, and dashboards for better insight and faster recovery. • Develop automation, tooling, and monitoring solutions to support high service availability. • Partner with application and quality engineering teams to implement best practices in reliability, release automation, and testing. • Drive operational excellence through proactive incident prevention, blameless postmortems, and capacity planning. • Participate in on-call rotations to support critical services and ensure rapid response to incidents.

United States