Cloud Dev/Ops Engineer

A business unit of General Dynamics, General Dynamics Information Technology (GDIT) supports some of the United States' most complex government, defense, and in

DevOps Engineer57 days ago

Full Time Remote

Company Site

Title: Cloud Dev/Ops Engineer Job Description: Responsibilities for this Position Location: Any Location / Remote Full Part/Time: Full time Job Req: RQ220206 Type of Requisition: Regular Clearance Level Must Currently Possess: None Clearance Level Must Be Able to Obtain: None Public Trust/Other Required: None Job Family: IT Infrastructure and Operations Job Qualifications: Skills: Automation, Cloud DevOps, Cloud Engineering, Networking, Troubleshooting Certifications: None Experience: 10 + years of related experience US Citizenship Required: No Job Description: The Cloud DevOps Engineer delivers streamlined, effective solutions to complex technical challenges by engineering, implementing, and supporting cloud-based systems tailored to meet the needs of the business. This position plays a central role in maintaining cloud performance and reliability, ensuring the end-user experience remains a top priority while contributing to organizational and professional growth. The role requires a commitment to operational excellence and the ability to apply modern engineering practices in support of large-scale cloud environments. The engineer is responsible for hands-on development, support, troubleshooting, and maintenance of cloud-hosted systems and enterprise applications. Candidates must demonstrate proven experience with cloud concepts, engineering, and daily operations. Key responsibilities include executing upgrades, monitoring system performance, resolving technical issues, and implementing solutions across Azure & OCI environments. This role functions as a primary contributor to daily program operations and demands strong independent problem-solving skills. This position leads operational efficiencies through the application of automation and Infrastructure-as-Code methodologies. A background in traditional networking or network security, with experience transitioning on-premises expertise into cloud and automation practices, is essential. The ideal candidate possesses broad cloud infrastructure knowledge-spanning Azure, OCI, and AWS-covering deployment, operations, and troubleshooting. Proficiency with Ansible is required, along with familiarity with additional automation tools and an understanding of their appropriate use cases. Success in this role requires an individual who exercises sound technical judgment, critical thinking, maintains transparent communication, and proactively escalates issues when necessary. The engineer is trusted to operate with significant autonomy while ensuring work status, progress, and challenges remain visible to peers and leadership. When new information necessitates adjustments in technical direction, the engineer is expected to consult with the team and management to maintain alignment. This role requires persistence, collaboration, and a commitment to driving cultural and operational improvements across the enterprise. Key Responsibilities: - Cloud Operations & Support: - Troubleshoot and support Azure & OCI environments - Perform advanced OS upgrades, patching, and system health checks. - Manage Firewalls, VPNs, load balancers, serverless apps, and microservices. - Monitoring & Troubleshooting: - Operate observability stacks - Analyze logs and telemetry for performance optimization and RCA. - Automate tasks with Python, Bash, or Shell. - Automation: - Implement Infrastructure as Code (Ansible/Python) - Security & Governance: - Manage TLS/SSL certificates, secrets, and PKI processes. - Support USGOV Azure, Oracle, AWS and vulnerability remediation efforts. - Maintain security accreditation and documentation. - Containers & Cloud Operations: - Deploy and optimize Kubernetes workloads and Docker containers Required Skills & Experience (What You Bring): - Strong networking experience troubleshooting and designing, especially cloud VPN deployments - Strong Firewall experience with network and cloud firewalls, able to troubleshoot firewall issues across native cloud and device firewalls - Manage and troubleshoot on-prem network infrastructure including routing, switching, VPNs, and enterprise connectivity. - Network engineering - previous routing / switching / firewall experience with Palo Alto, Juniper, and Cisco firewalls, routers, and switches. - Implement and maintain network security policies, firewall rules, and high-availability configurations across mission-critical environments. - Experience with cloud networks -how cloud networks differ from traditional networking, focus on Azure Cloud and Oracle Cloud - Automation experience - developing and supporting automation for you previous jobs, focused on ansible experience or general python experience as it applies to cloud and networks - Comfortable with navigating Linux, network routers, etc. - Containers - experience with container networking including Kubernetes and cloud native container services - DNS - experience with cloud DNS and how DNS affects data flows across clouds - Project Management - experience on gathering minimal viable requirements and independently working through solutions to delivery. Experience coordinating a project across teams, identifying risks and mitigation - API Automation development as it pertains to networks and cloud - Cloud Data Flows - experience assessing PaaS and SaaS services and how data flow across security boundaries EDUCATION AND EXPERIENCE: High School Diploma Required BA/BS or equivalent 2+ years of experience or 6+ years of Experience with HSD The likely salary range for this position is $102,000 - $138,000. This is not, however, a guarantee of compensation or salary. Rather, salary will be set based on experience, geographic location and possibly contractual requirements and could fall outside of this range. Scheduled Weekly Hours: 40 Travel Required: Less than 10% Telecommuting Options: Remote Work Location: Any Location / Remote Additional Work Locations: Total Rewards at GDIT: Our benefits package for all US-based employees includes a variety of medical plan options, some with Health Savings Accounts, dental plan options, a vision plan, and a 401(k) plan offering the ability to contribute both pre and post-tax dollars up to the IRS annual limits and receive a company match. To encourage work/life balance, GDIT offers employees full flex work weeks where possible and a variety of paid time off plans, including vacation, sick and personal time, holidays, paid parental, military, bereavement and jury duty leave. GDIT typically provides new employees with 15 days of paid leave per calendar year to be used for vacations, personal business, and illness and an additional 10 paid holidays per year. Paid leave and paid holidays are prorated based on the employee's date of hire. The GDIT Paid Family Leave program provides a total of up to 160 hours of paid leave in a rolling 12 month period for eligible employees. To ensure our employees are able to protect their income, other offerings such as short and long-term disability benefits, life, accidental death and dismemberment, personal accident, critical illness and business travel and accident insurance are provided or available. We regularly review our Total Rewards package to ensure our offerings are competitive and reflect what our employees have told us they value most. We are GDIT. A global technology and professional services company that delivers consulting, technology and mission services to every major agency across the U.S. government, defense and intelligence community. Our 26,000 experts extract the power of technology to create immediate value and deliver solutions at the edge of innovation. We operate across 50 countries worldwide, offering leading capabilities in digital modernization, AI/ML, Cloud, Cyber and application development. Together with our clients, we strive to create a safer, smarter world by harnessing the power of deep expertise and advanced technology. Join our Talent Community to stay up to date on our career opportunities and events at gdit.com/tc. Equal Opportunity Employer / Individuals with Disabilities / Protected Veterans

View details: Cloud Dev/Ops Engineer

Worldwide

$102K - $138K / year

Apply

Senior Site Reliability Engineer (SRE) - (GCP)

Devsu

Devsu is a technology agency that provides software development services, IT augmentation and staffing.

DevOps Engineer57 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

Role Description We are seeking a Site Reliability Engineer (SRE) with deep expertise in monitoring, observability, and reliability engineering to support systems running across on-premises infrastructure and Google Cloud Platform (GCP). This role is primarily responsible for designing, operating, and improving monitoring, alerting, and observability platforms, with a strong focus on Grafana and Kubernetes environments. As a secondary responsibility, this role provides backup coverage for the Application Support team during periods of resource constraints or major incidents, offering L2/L3 technical support when required. Responsibilities - Monitoring & Observability (Core Focus) - Own and operate the monitoring and observability stack across on-prem and GCP environments - Design, build, and maintain Grafana dashboards for infrastructure, Kubernetes, and applications - Define, tune, and maintain alerts to ensure high signal-to-noise ratio - Establish observability standards and best practices across teams - Improve visibility into system health, performance, and reliability - Site Reliability Engineering - Apply SRE principles to improve availability, performance, and resilience - Define and track SLIs, SLOs, and error budgets - Participate in on-call rotations and SEV incident response - Lead or contribute to incident investigations and root cause analysis (RCA) - Drive preventative actions to reduce repeat incidents - Kubernetes & Platform Reliability - Support and monitor Kubernetes environments (GKE and on-prem clusters) - Monitor cluster health, capacity, and resource utilization - Troubleshoot platform-level issues impacting application reliability - Collaborate with Platform and Engineering teams on reliability improvements - Secondary Responsibilities (Backup Application Support) - Provide L2/L3 application support coverage during: - Support team resource shortages - High-severity incidents (SEVs) - Peak support periods or escalations - Triage and troubleshoot application issues using existing runbooks and dashboards - Collaborate with Application Support and Engineering teams during incidents - Ensure all actions, findings, and resolutions are documented in ServiceNow (SNOW) Qualifications - Strong experience as a Site Reliability Engineer or Reliability Engineer - Deep hands-on expertise with Grafana (dashboards, alerting, troubleshooting) - Solid experience with monitoring and observability systems - Production experience operating Kubernetes environments - Experience supporting systems in GCP and on-prem environments (mandatory) - Strong Linux systems and troubleshooting skills - Fluent English (written and spoken) - Ability to work in PST time zone - Ability to participate in an on-call rotation that includes coverage for one weekend day Requirements - Technology Stack: - Observability: Grafana, Prometheus, logging platforms - Containers: Kubernetes (GKE and on-prem) - Cloud: Google Cloud Platform (GCP) - Operations: Linux, networking, infrastructure monitoring - Incident Tools: PagerDuty, ServiceNow, Slack (or equivalents) - Nice to have: - Experience supporting application teams during SEV incidents - Knowledge of capacity planning and performance tuning - Scripting skills (Python, Bash, etc.) - Experience with hybrid infrastructure environments Benefits - A stable, long-term contract with opportunities for career growth - Private health insurance - A remote-friendly culture that promotes work-life balance - Continuous training, mentorship, and learning programs to keep you at the forefront of the industry - Free access to AI training resources and state-of-the-art AI tools to elevate your daily work - A flexible Paid Time Off (PTO) policy as well as paid holiday days - Challenging, world-class software projects for clients in the US and LatAm - Collaboration with some of the most talented software engineers in Latin America and the US, in a diverse work environment

View details: Senior Site Reliability Engineer (SRE) - (GCP)

PST (UTC-8)

Apply

Job Closed

Senior Site Reliability Engineer (SRE)

Devsu

Devsu is a technology agency that provides software development services, IT augmentation and staffing.

DevOps Engineer57 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

Role Description We are seeking a Site Reliability Engineer (SRE) with deep expertise in monitoring, observability, and reliability engineering to support systems running across on-premises infrastructure and Google Cloud Platform (GCP). This role is primarily responsible for designing, operating, and improving monitoring, alerting, and observability platforms, with a strong focus on Grafana and Kubernetes environments. As a secondary responsibility, this role provides backup coverage for the Application Support team during periods of resource constraints or major incidents, offering L2/L3 technical support when required. Responsibilities - Monitoring & Observability (Core Focus) - Own and operate the monitoring and observability stack across on-prem and GCP environments - Design, build, and maintain Grafana dashboards for infrastructure, Kubernetes, and applications - Define, tune, and maintain alerts to ensure high signal-to-noise ratio - Establish observability standards and best practices across teams - Improve visibility into system health, performance, and reliability - Site Reliability Engineering - Apply SRE principles to improve availability, performance, and resilience - Define and track SLIs, SLOs, and error budgets - Participate in on-call rotations and SEV incident response - Lead or contribute to incident investigations and root cause analysis (RCA) - Drive preventative actions to reduce repeat incidents - Kubernetes & Platform Reliability - Support and monitor Kubernetes environments (GKE and on-prem clusters) - Monitor cluster health, capacity, and resource utilization - Troubleshoot platform-level issues impacting application reliability - Collaborate with Platform and Engineering teams on reliability improvements - Secondary Responsibilities (Backup Application Support) - Provide L2/L3 application support coverage during: - Support team resource shortages - High-severity incidents (SEVs) - Peak support periods or escalations - Triage and troubleshoot application issues using existing runbooks and dashboards - Collaborate with Application Support and Engineering teams during incidents - Ensure all actions, findings, and resolutions are documented in ServiceNow (SNOW) Qualifications - Strong experience as a Site Reliability Engineer or Reliability Engineer - Deep hands-on expertise with Grafana (dashboards, alerting, troubleshooting) - Solid experience with monitoring and observability systems - Production experience operating Kubernetes environments - Experience supporting systems in GCP and on-prem environments (mandatory) - Strong Linux systems and troubleshooting skills - Fluent English (written and spoken) - Ability to work in PST time zone - Ability to participate in an on-call rotation that includes coverage for one weekend day Requirements - Technology Stack: - Observability: Grafana, Prometheus, logging platforms - Containers: Kubernetes (GKE and on-prem) - Cloud: Google Cloud Platform (GCP) - Operations: Linux, networking, infrastructure monitoring - Incident Tools: PagerDuty, ServiceNow, Slack (or equivalents) - Nice to have: - Experience supporting application teams during SEV incidents - Knowledge of capacity planning and performance tuning - Scripting skills (Python, Bash, etc.) - Experience with hybrid infrastructure environments Benefits - A stable, long-term contract with opportunities for career growth - Private health insurance - A remote-friendly culture that promotes work-life balance - Continuous training, mentorship, and learning programs to keep you at the forefront of the industry - Free access to AI training resources and state-of-the-art AI tools to elevate your daily work - A flexible Paid Time Off (PTO) policy as well as paid holiday days - Challenging, world-class software projects for clients in the US and LatAm - Collaboration with some of the most talented software engineers in Latin America and the US, in a diverse work environment

View details: Senior Site Reliability Engineer (SRE)

PST (UTC-8)

Apply

Job Closed

Senior Software Engineer with SRE background (AWS, Azure, GCP)

Dynatrace

Dynatrace is a global application performance management software firm and a former member of Compuware. As an employer, the company is in support of helping it

DevOps Engineer57 days ago

Full Time RemoteTeam 5,600Since 2005

Company Site

Your role at Dynatrace Are you a passionate Senior Software Engineer with SRE background ready to shape the future of product development? Do you thrive in a collaborative, international environment and want to make a real impact on our customers? If you're excited about observability platforms and want to contribute to a globally leading product, this is your opportunity. This role is designed for a Senior Software Engineer from an SRE background, experienced in Cloud monitoring who wants to move from operating to building—come and help us advance our Observability Solution. Our engineering culture is built on technical excellence, ownership, and continuous feedback. We live by the principle of "You build it, you run it", and work in agile iterations to deliver high-quality customer value. As Senior Software Engineer with SRE background you will: - Leverage your SRE experience to enhance our cloud observability product and create tailored solutions that empower our users like other SREs to monitor, diagnose, and maintain systems with greater experience. - Ensure best practices across AWS, Azure, and GCP integrations that efficiently ingest cloud telemetry data, ensuring customers receive the highest quality and most relevant datasets - Utilize AI capabilities for elevating features like anomaly detection and correlate multidirectional signals for faster root cause analysis - Work with cloud technologies (AWS, Azure, GCP), researching and building knowledge around modern cloud architectures. You will build visualizations that help our users understand the complexity of cloud observability data. - Drive architectural decisions and contribute to the evolution of our platform. - Collaborate with stakeholders and drive decisions aligned with the product strategy. - Foster high-quality software engineering practices, automation and optimization of tooling and processes (CI/CD integrations). What will help you succeed - 5 + years of hands-on experience in Site Reliability Engineering, working with Cloud Infrastructure on AWS, Azure or GCP - Experience in software engineering, developing with JavaScript/TypeScript and/or working with backend languages such as Python and Java - Technical studies related to Software Engineering or equivalent experience - Hands-on experience with monitoring, logging, and observability tools like Dynatrace, Datadog, Splunk, Grafana or CloudWatch, Azure Monitor - Experience working closely with development teams to improve application delivery and build efficient, automated pipelines - Excellent verbal and written communication skills, with the ability to convey complex technical concepts clearly. - Strong analytical skills with the ability to understand end-to-end use cases, map system flows. - Good English communication Why you will love being a Dynatracer - Dynatrace is a leader in unified observability and security. - We provide a culture of excellence with competitive compensation packages designed to recognize and reward performance. - Our employees work with the largest cloud providers, including AWS, Microsoft, and Google Cloud, and other leading partners worldwide to create strategic alliances. - The Dynatrace platform uses cutting-edge technologies, including our own Davis hypermodal AI, to help our customers modernize and automate cloud operations, deliver software faster and more securely, and enable flawless digital experiences. - Over 50% of the Fortune 100 companies are current customers of Dynatrace. Compensation and Rewards - We offer only employment contracts and a remote working setup. This is a permanent role and not a B2B contract. - We offer attractive compensation packages and stock purchase options with numerous benefits and advantages. - Base salary range — 21.300 - 26.700 PLN gross per month with higher pay based on experience and qualifications. Equal Employment OpportunityDynatrace provides equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, veteran status, or any other protected characteristic. We actively foster an inclusive workplace that celebrates differences and promotes accessibility, collaboration, and growth for all.

View details: Senior Software Engineer with SRE background (AWS, Azure, GCP)

Poland

Apply

Job Closed

DevOps Engineer, SRE

Job Description

Job Requirements

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Cloud Dev/Ops Engineer

Senior Site Reliability Engineer (SRE) - (GCP)

Senior Site Reliability Engineer (SRE)

Senior Software Engineer with SRE background (AWS, Azure, GCP)