IT & Engineering for a better tomorrow.
Senior DevOps Engineer – SRE
Location
United States
Posted
4 days ago
Salary
0
Seniority
Senior
Job Description
Senior DevOps Engineer – SRE
Zealogics Inc
• Cooperating with software teams and maintaining the build scripts • Improving site reliability • Enforcing good software development practices using infrastructure as code (IAC) methodology • Site reliability/uptime management • CI/CD maintenance and automation of CI/CD processes • Security monitoring and patching • Downtime and patching communication with internal/external stakeholders • System upgrades • New tool implementation and onboarding • Other DevOps related issues included but not limited to above
Job Requirements
- Work in an Agile environment to roll out features/fixes/updates in a sprint cycle
- Work with international team to deploy new tools and support existing tools
- Communicate well with internal/external stakeholders to fulfill gather requirements and schedule fixes/downtime
- Work on CI/CD pipelines and support stakeholders with errors running these pipelines
- Pro-actively seeking out irregularities, investigate and report back the findings
- AKS, Azure DevOps Pipeline, VM management in Azure.
- Build engineer with experience in Atlassian Bamboo or any modern CI/CD toolchain.
- Experience in helm charts.
- Skilled in Python, PowerShell, Go, Bash
- Understands CI/CD concepts and automation of.
- SRE experience: version upgrades, cert upgrades, mitigation of security issues in a cloud infrastructure .
- Worked in an Agile scrum/kanban development model.
- Experience with deployment and use of Artifactory, SonarQube, Bitbucket, Bamboo, Jira, Confluence
- Professional level verbal and written communication skills in English.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Design, deploy, and maintain secure cloud infrastructure environments. • Build and support CI/CD pipelines and deployment automation using GitLab CI/CD. • Manage and address client security requirements, integrating security best practices into code pipelines. • Manage AWS cloud infrastructure, including AWS GovCloud environments where applicable. • Support Infrastructure as Code initiatives using Terraform and Pulumi. • Configure and maintain Docker-based deployment environments. • Collaborate with engineering teams to improve scalability, monitoring, reliability, and operational performance. • Support security hardening, compliance efforts, and Authority to Operate (ATO) activities for government systems. • Troubleshoot infrastructure, networking, and deployment-related issues across environments. • Participate in Agile development and operational planning activities. • Support backup, disaster recovery, logging, and observability initiatives. • Active contributor in accreditation and documentation efforts for compliance, cybersecurity, and other government standards and regulatory requirements.
Senior Database Reliability Engineer
Conexa SaúdeSolutions in Telemedicine which optimize health care access.
• Continuously monitor the data layer's performance — slow queries, locks, index usage, saturation and capacity — defining thresholds, alerts and action plans. • Conduct capacity planning analyses and propose evolution of the data architecture according to product growth. • Technically review queries and data models proposed by developers, acting as a consultative reference and guardian of standards. • Define and evolve company database standards: naming, versioning, integrity, modeling and scalability. • Ensure best practices are consistently adopted by engineering teams. • Manage the database change cycle across dev/staging/prod environments, implementing CI/CD flows appropriate for schema and data changes. • Assess migration risks, define maintenance windows and rollback strategies. • Define and operate backup, restore, replication and disaster recovery strategies, including periodic recovery testing. • Sustain availability SLOs and the data layer's RPO/RTO objectives. • Establish and review permission policies, access segregation and auditing. • Ensure adherence to applicable security and compliance requirements (LGPD — Brazilian General Data Protection Law — and the frameworks adopted by the company). • Develop and evolve internal tooling that improves developer experience (DX), observability, access management and the scalability of data operations. • Act as a technical bridge between product engineering, infrastructure and security.
Role Description We are seeking a Senior Principal Platform Engineer to join the MCCS (Modular Control Centre System) programme at 50Hertz, Germany's leading transmission system operator. You will help build and establish a cloud-native hybrid cloud engineering platform (Azure and on-premises) supporting approximately 20 product teams and 150 developers, acting as a central enabler for the efficient development and stable operation of MCCS products. - Designing, implementing and documenting a cloud-native, Kubernetes-based engineering platform - Adapting and customising standardised engineering services across code, build, deploy and run phases - Implementing Infrastructure-as-Code for reproducible, versioned environment and service provisioning, enabling developer self-service - Integrating centrally provided platform services including Kubernetes, Kafka, PostgreSQL and observability tooling - Designing, implementing and operating CI/CD pipelines and establishing SDLC standards aligned to best practices and security requirements - Introducing trunk-based development, automated testing and build and release best practices - Analysing existing systems and deriving migration strategies for applications and development teams transitioning to EDP - Providing technical consultancy on integration into the multi-tier platform architecture (EDP, GrASP, MCCS) - Empowering development teams to use the platform independently over time Qualifications - 10+ years of experience as a DevOps and/or Platform Engineer - 5+ years of experience as a Principal Platform Engineer and/or SRE - Excellent knowledge of setting up and migrating engineering platforms for very large development teams (20+ teams, 100+ developers) - Excellent knowledge and extensive experience in Developer Experience - Excellent knowledge of build and deployment tooling across Azure and on-premises environments - Extensive expertise in GitOps deployments and deployment strategies including ArgoCD and Helm - Excellent knowledge of cloud-native secure SDLCs - Excellent knowledge of monitoring with Grafana, Prometheus, Loki and OpenTelemetry - Fluent English (C1 minimum) Requirements - Proven experience building engineering platforms and self-service platforms/Golden Paths - Experience designing platforms as products with clear APIs and roadmaps - Experience in large-scale cloud/platform migrations - Expertise in Kubernetes multi-cluster platform operation - Knowledge of CI/CD architectures including Azure DevOps and Harness.io - Knowledge of GitOps and deployment strategies including Blue/Green and Canary - DevSecOps and Security-by-Design experience in critical infrastructure or regulated systems Benefits - Flexible working hours and the freedom to choose your own projects - Access to exciting projects in various industries - Support in advancing your career - Competitive pay - A dedicated team to help you with any questions you may have - Opportunity to work independently and utilise a strong network to achieve your professional goals
Senior Site Reliability Engineer, Infrastructure
VultrVultr is on a mission to make high-performance cloud computing easy to use, affordable, and locally accessible.
• Design and build the observability pipeline for datacenter infrastructure including CDUs, PDUs, bare metal servers, and provisioning workflows, collecting telemetry via Redfish, IPMI, SNMP, and OpenTelemetry. • Own the full stack from data collection through to visualization and alerting in Grafana, Loki, and Mimir. • Build dashboards and alerting that are actionable and meaningful for stakeholder teams including Datacenter Ops, SysAdmin, Network, and Provisioning. • Establish standards and patterns for how datacenter infrastructure telemetry is collected, stored, and visualized across Vultr's global footprint. • Partner closely with stakeholder teams to understand their operational needs and translate them into observable, measurable signals. • Drive infrastructure-as-code practices across the observability pipeline to ensure consistency, repeatability, and maintainability.




