SRE Sr Engineer/Specialist

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 10,001+Since 1903H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

64 days ago

Salary

0

Seniority

Senior

No structured requirement data.

Job Description

SRE Sr Engineer/Specialist

Ford Motor Company

Role Description As an SRE at Ford, you'll be instrumental in developing, enhancing, and expanding our global monitoring and observability platform. You'll blend AI with software and systems engineering to ensure the uptime, scalability, and maintainability of our critical cloud services. You'll be at the intersection of SRE and Software Development, building and driving the adoption of our global monitoring and triaging capabilities. If you're passionate about using your IT expertise and analytical skills to shape the future of transportation, this is your opportunity to make a real impact. Join us and be part of a team that's building the future of observability! - Write, configure, and deploy code that improves service reliability for existing or new systems; set standard for others with respect to code quality. - Provide helpful and actionable feedback and review for code or production changes. - Drive repair/optimization of complex systems with consideration towards a wide range of contributing factors. - Lead debugging, troubleshooting, and analysis of service architecture and design. - Participate in on-call rotation. - Write documentation: design, system analysis, runbooks, playbooks. Provide design feedback and uplevel design skills of others. - Implement and manage SRE monitoring applications using AI, Python, and Observability data. - Develop tooling using Terraform and other IaC tools to ensure visibility and proactive issue detection across our platforms. - Work within GCP infrastructure, optimizing performance, and cost, and scaling resources to meet demand. - Collaborate with development teams to enhance system reliability and performance, applying a platform engineering mindset to system administration tasks. - Develop and maintain AI-enhanced automated solutions for operational aspects such as on-call monitoring, performance tuning, and disaster recovery. - Troubleshoot and resolve issues in our dev, test, and production environments. - Participate in postmortem analysis and create preventative measures for future incidents. - Implement and maintain security best practices across our infrastructure, ensuring compliance with industry standards and internal policies. Participate in security audits and vulnerability assessments. - Participate in capacity planning and forecasting efforts to ensure our systems can handle future growth and demand. Analyze trends and make recommendations for resource allocation. - Identify and address performance bottlenecks through code profiling, system analysis, and configuration tuning. Implement and monitor performance metrics to proactively identify and resolve issues. - Develop, maintain, and test disaster recovery plans and procedures to ensure business continuity in the event of a major outage or disaster. Participate in regular disaster recovery exercises. - Contribute to internal knowledge bases and documentation. Qualifications - Bachelor’s degree in Computer Science, Engineering, Mathematics or equivalent work experience. - 3+ years of experience as an SRE, DevOps Engineer, Software Engineer or similar role. - Agentic AI and MCP development experience preferred. - Strong experience with Python development and desired familiarity with Terraform Provider development. - Experience with Dynatrace SaaS preferred. - Proficient with monitoring and observability tools. - Proficient with cloud services, with a strong preference for Kubernetes and Google Cloud Platform (GCP) experience. - Solid programming skills in Python, with a good understanding of software development best practices. - Experience with relational and document databases. - Ability to debug, optimize code, and automate routine tasks. - Strong problem-solving skills and the ability to work under pressure in a fast-paced environment. - Excellent verbal and written communication skills. Benefits - Immediate medical, dental, and prescription drug coverage. - Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more. - Vehicle discount program for employees and family members, and management leases. - Tuition assistance. - Established and active employee resource groups. - Paid time off for individual and team community service. - A generous schedule of paid holidays, including the week between Christmas and New Year’s Day. - Paid time off and the option to purchase additional vacation time. Company Description This role is remote. Visa Sponsorship is not provided for this specific role. Relocation assistance is not provided for this specific role. Candidates for positions with Ford Motor Company must be legally authorized to work in the United States. Verification of employment eligibility will be required at the time of hire. We are an Equal Opportunity Employer committed to a culturally diverse workforce. All qualified applicants will receive consideration for employment without regard to race, religion, color, age, sex, national origin, sexual orientation, gender identity, disability status or protected veteran status. In the United States, If you need a reasonable accommodation for the online application process due to a disability, please call 1-888-336-0660. Salary Grade 6-8

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Akvelon, Inc. logo

Lead or Middle or Senior DevOps Engineer

Akvelon, Inc.

Custom-Built Software Engineering Teams

DevOps Engineer64 days ago
Full TimeRemoteTeam 1,001-5,000Since 2000H1B No Sponsor

Role Description We are looking for a Lead, Middle, or Senior DevOps Engineer to join a research infrastructure team building an on-demand GPU platform for advanced compute workflows. The role focuses on enabling secure, scalable, and user-friendly access to high-performance GPU resources through automation, scheduling, and modern platform tooling. Locations: Serbia, Georgia, Armenia, Kazakhstan, Poland, Croatia, Portugal, Egypt. Tasks - Strong hands-on experience with Kubernetes and platform orchestration; - Solid understanding of scheduling, reservation, or namespace-based resource management systems; - Experience with GPU infrastructure, virtualization, slicing, or containerized workstation environments; - Strong scripting and automation skills; - Practical Azure experience and familiarity with secure infrastructure operations. Requirements - Build and improve an on-demand GPU workstation platform with lightweight containerization or virtualization; - Implement scheduling, reservation, registration, image management, storage mounting, SSH with SSO, and developer-friendly access flows; - Automate cluster namespace configuration across CPU, GPU, memory, and storage allocations; - Support hierarchical capacity allocation models with RBAC-based administration; - Automate storage import, export, and archival workflows as allocations change; - Build monitoring, alerts, and automated incident ticket creation for large-scale cluster environments; - Improve integrations between source control, CI/CD, package distribution, and GPU-connected development workflows; - Contribute automation, scripts, and agentic tooling that improve infrastructure and day-to-day research workflows. Nice to Have - Experience with Prometheus, Grafana, incident automation, or on-call paging workflows; - Experience with developer platforms, devcontainers, or remote development tooling such as VS Code integrations; - Exposure to AI-assisted monitoring, trend analysis, or agentic infrastructure tooling. Engagement Type - B2B contract. Location / Timezone - Remote work from Serbia, Georgia, Armenia, Kazakhstan, Poland, Croatia, Portugal, Egypt. - European working hours. - Occasionally available for meetings up to 10:00 AM PST (US overlap).

Portugal + 7 moreAll locations: Portugal | Poland | Egypt | Georgia | Croatia | Serbia | Armenia | Kazakhstan
Job Closed
Full Scale logo

Site Reliability Engineer – SRE

Full Scale

Build software development teams quickly and affordably.

DevOps Engineer64 days ago
ContractRemoteTeam 201-500Since 2018H1B No Sponsor

• Manage the reliability, availability, and performance of high-traffic web platforms. • Administer and optimize Cloudflare services, including CDN, caching, DNS, WAF, and rate limiting. • Configure and manage DataDome to mitigate bots, abuse, scraping, and malicious traffic. • Monitor production systems and respond to incidents affecting uptime, latency, and user experience. • Investigate outages and performance issues, conduct root cause analysis, and implement long-term fixes. • Collaborate with engineering teams to improve resiliency, observability, and deployment safety. • Support traffic scaling, capacity planning, and operational readiness for large-volume environments. • Implement automation and operational best practices to improve stability and efficiency.

Philippines
Job Closed
Full Scale logo

Senior DevOps Engineer

Full Scale

Build software development teams quickly and affordably.

DevOps Engineer64 days ago
ContractRemoteTeam 201-500Since 2018H1B No Sponsor

• Design, build, and manage production-grade infrastructure in AWS • Build, scale, and maintain Kubernetes environments for critical services • Develop and improve CI/CD pipelines and infrastructure automation • Drive observability through monitoring, logging, tracing, and SLI/SLO implementation • Lead incident response, root cause analysis, and reliability improvements • Embed PCI DSS v4.0 compliance into infrastructure and delivery workflows • Implement security best practices, including IAM, RBAC, secrets management, and encryption • Drive cloud cost optimization and improve infrastructure performance and efficiency • Collaborate closely with engineering, product, and security teams to support platform growth

Philippines
Job Closed
Jimmy Technologies logo

Senior DevOps Engineer

Jimmy Technologies

Leveraging the world’s best IT brains to build first-class software and shape your digital products

DevOps Engineer64 days ago
ContractRemoteTeam 11-50H1B No Sponsor

• Design and implement scalable, repeatable deployment frameworks for AI, data, and cloud-native applications. • Develop and maintain Infrastructure as Code (IaC), automated environment provisioning, and deployment workflows to ensure consistency across environments. • Build and optimize CI/CD pipelines that enable reliable, automated delivery across development, testing, staging, and production. • Standardize application packaging and deployment models to enable seamless delivery into customer environments with minimal customization. • Define and implement best practices for secrets and configuration management, identity and access management (IAM), networking, secure connectivity, observability, logging, monitoring, alerting, and release management. • Improve production readiness by strengthening application resilience, scalability, security, runtime governance, and operational excellence. • Observability Improvements: Enhance monitoring for services to improve system reliability. • Scripting & Automation: Develop, implement, and maintain scripts to automate processes and reduce manual efforts.

Czechia
€35 - €45 / hour