CMG's solution is the first ECM platform in the U.S. to provide digital connectivity between the buy-side and sell-side.
Site Reliability Engineer
Location
Canada
Posted
21 hours ago
Salary
0
Seniority
Senior
Job Description
Site Reliability Engineer
CMG (Capital Markets Gateway)
• Design, implement, and maintain monitoring and observability solutions using tools like Prometheus, Grafana Stack (Loki/Grafana/Tempo/Alert Manager), Datadog, and OpenTelemetry. • Define and implement SLOs, SLIs, and error budgets to measure system reliability. • Develop and optimize dashboards, alerts, and reports for system performance and business metrics. • Design actionable alerting strategies to minimize noise and improve MTTR. • Integrate alerting systems with Jira. • Establish and refine runbooks for on-call teams to handle alerts efficiently. • Empower teams to ensure observability coverage and incident response practices. • Analyze system performance metrics, identify bottlenecks, and implement optimizations to improve system efficiency, scalability, and cost-effectiveness. • Help conduct load testing and capacity planning to ensure systems can handle peak traffic loads. • Identify opportunities for automation and develop tools to streamline operational processes, such as fail-over, configuration management, and monitoring. • Implement monitoring and alerting systems within automations to detect and resolve issues proactively. • Collaborate closely with cross-functional teams, including software engineers, operations, and infrastructure teams, to understand system requirements, provide technical guidance, and drive solutions. • Communicate effectively to stakeholders about system changes, incidents, and improvements. • Foment and spread SRE principles and practices across the company.
Job Requirements
- Must be based in Latin America
- English level - C1 or C2
- Proven experience as a Site Reliability Engineer or similar role.
- Proficiency in logging, metrics, and tracing frameworks (DataDog, Loki, Prometheus, OpenTelemetry).
- Experience with cloud platforms (Azure preferred) and infrastructure-as-code tools (e.g., Terraform).
- Strong programming and scripting skills (Python, Bash).
- Proficiency in containerization technologies and orchestration tools (Docker, Kubernetes).
- Understanding of Linux-based systems, networking, and security principles related to containerized applications.
- Strong problem-solving and troubleshooting skills, with a passion for identifying and resolving complex technical issues.
- Excellent communication and collaboration abilities.
- Ability to thrive in a fast-paced, constantly evolving environment.
- Experience with PostgreSQL monitoring and optimization (Optional/Nice to have).
Benefits
- Equity
- Unlimited PTO (15 days + bank holidays + unlimited additional paid leave)
- Comprehensive benefits program managed by Globalization Partners
- Premium life and income protection
- Top private medical and dental insurance
- Employee Assistance Program (EAP)
- Pension contributions
- Remote work environment
- Education reimbursement
- Continuous learning opportunities
- Employee referral bonus
- Parental leave
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Cloud Engineer – DevOps
Innovative SolutionsAn AWS Premier Tier Services Partner focused on helping every SMB leverage the power of the cloud.
• Design and implement scalable, secure AWS infrastructure using Infrastructure as Code (IaC) practices across multiple client engagements simultaneously. • You'll build CI/CD pipelines, automate deployment processes, and establish monitoring and observability solutions that enable clients to operate efficiently in the cloud. • Working closely with solutions architects and project managers, you'll translate client requirements into technical solutions while maintaining high standards for security, reliability, and performance. • Collaborate with client technical teams to implement DevOps best practices, troubleshoot complex infrastructure issues, and provide knowledge transfer to ensure long-term success. • You'll balance multiple project priorities, adapt to varying client environments, and contribute to our internal tooling and methodology improvements. • Your work will directly support our AWS DevOps Competency and help clients achieve their digital transformation objectives.
Site Reliability Engineer
Orion HealthRevolutionising global healthcare so every individual receives the perfect care for them.
• Design, implement, and maintain reliable, scalable, and secure infrastructure that supports Orion Health's products and services. • Define and monitor Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to ensure platform reliability and customer satisfaction. • Build and maintain observability solutions, including monitoring, logging, alerting, and tracing capabilities across cloud environments. • Participate in incident response activities, including troubleshooting, root cause analysis, remediation planning, and post-incident reviews. • Lead initiatives to reduce operational toil through automation, Infrastructure as Code (IaC), and self-service capabilities. • Collaborate closely with software engineering teams to improve application reliability, performance, and operational readiness. • Identify and eliminate reliability bottlenecks through performance tuning, capacity planning, and system optimization. • Support infrastructure and platform upgrades, ensuring minimal disruption and maintaining service availability. • Conduct capacity forecasting and scalability planning to meet future business and customer demands. • Develop operational runbooks, standards, and best practices that improve system resilience and operational efficiency. • Champion reliability engineering principles and foster a culture of continuous improvement across teams. • Contribute to disaster recovery, business continuity, and platform resilience initiatives.
DevSecOps Lead
GovCIOGovCIO is a service-disabled-veteran-owned small business (SDVOSB) that offers technology services to improve business performance for government organizations.
Role Description GovCIO is currently hiring for a DevSecOps Manager to lead the integration of security throughout the software development lifecycle (SDLC) while supporting secure cloud and infrastructure operations for enterprise and government environments. This role is responsible for implementing security-first DevOps practices, managing CI/CD pipelines, automating security controls, and ensuring compliance with federal cybersecurity regulations and organizational security standards. This position will be fully remote within the United States. - Builds and codes applications and/or modules using languages such as C++, Visual Basic, ABAP, JAVA, XTML, etc. - Provides patches and upgrades to existing systems. - Involved in planning of system and development deployment as well as responsible for meeting software compliance standards. - May design graphical user interface (GUI) to meet the specific needs of users. - Prepares operating instructions, compiles documentation of program development, and analyzes system capabilities to resolve questions of program intent, output requirements, input data acquisition, programming techniques, and controls. - May build add-on modules using application program language. - Designs and codes applications following specifications using the appropriate tools. - Maintains and modifies existing software applications. - Analyzes detailed systems factors, including input and output requirements, information flow, hardware and software requirements, and alternative methods of problem resolution. - Performs modifications to and maintenance of operational programs and procedures. - Participates in code reviews to represent reviewed work for adherence to standards and specifications. - Writes or revises program documentation, operations documentation, and user guides in accordance with standards. Qualifications - Bachelor's with 8+ years (or commensurate experience) - Strong background in software development and programming languages such as Python, Java, or Ruby. - Deep understanding of cloud environments (AWS, Azure, Google Cloud) and containerization technologies (Docker, Kubernetes). - Proficiency in implementing automated security and monitoring tools. - Excellent problem-solving skills and ability to work in a fast-paced, evolving environment. - Strong communication and collaboration skills to work effectively across various teams. Requirements - Ability to obtain and maintain a Suitability/Public Trust clearance. Posted Salary Range USD $130,000.00 - USD $135,000.00 /Yr.
Senior DevOps, Security Consultant
KATBOTZ®Driving Customer Success Through Finance Transformation: Advanced Processes, Analytics, & AI.
• Design, implement, and manage secure CI/CD pipelines and DevOps processes. • Automate infrastructure deployment and configuration management using Infrastructure as Code (IaC). • Implement cloud security best practices, governance, and compliance standards. • Collaborate with development, infrastructure, and security teams to ensure secure application delivery. • Conduct security assessments, vulnerability analysis, and risk mitigation activities. • Monitor infrastructure, applications, and cloud environments for performance and security threats. • Manage containerization and orchestration platforms such as Docker and Kubernetes. • Support incident response, disaster recovery, backup strategies, and business continuity planning. • Develop and maintain DevSecOps frameworks and automation workflows. • Provide technical leadership, mentorship, and documentation for operational processes.




