Cincinnatus is an enterprise staffing company that partners with leading technology companies to source and employ highly skilled professionals for full-time and long-term contingent roles. Cincinnatus serves as the employer of record for these engagements, providing W-2 employment, payroll, benefits, and compliance, while placing employees directly within client teams to work on high-impact initiatives. Roles hired through Cincinnatus are not project-based or freelance engagements. They are structured, role-based positions that typically involve full-time or fixed-term commitments, close collaboration with a client's internal teams, and integration into standard enterprise workflows. Cincinnatus is a legal entity separate from Mercor. While opportunities may be discovered through Mercor's platform, employment, onboarding, payroll, and benefits for these roles are administered by Cincinnatus. Equal Employment Opportunity Cincinnatus is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or any other legally protected characteristic. Cincinnatus is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans throughout the job application process.
DevOps Engineer - AI Model Evaluator
Location
Poland
Posted
1 day ago
Salary
$85 / hour
Seniority
Mid Level
No structured requirement data.
Job Description
DevOps Engineer - AI Model Evaluator
Mercor
Role Description Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark, General Catalyst, Peter Thiel, Adam D'Angelo, Larry Summers, and Jack Dorsey. Position: DevOps / SRE / Cloud Engineer (Coding Agent Experience) Type: Contract Compensation: $85/hour Location: Remote Role Responsibilities - Use frontier AI coding agents to complete and evaluate complex infrastructure engineering tasks. - Review model-generated implementations involving cloud platforms, Kubernetes, CI/CD systems, and infrastructure automation. - Identify bugs, edge cases, reliability issues, and failure modes in model outputs. - Compare outputs from multiple frontier models to assess strengths and weaknesses. - Apply professional engineering judgment to realistic infrastructure engineering scenarios. Qualifications - Must-Have: 2+ years of professional DevOps, SRE, or Cloud Engineering experience. - Experience with AWS, Azure, GCP, Kubernetes, Terraform, CI/CD pipelines, or observability tooling. - Regular use of AI coding agents like Cursor, Claude Code, Codex, Windsurf, Gemini CLI, or similar tools. - Ability to evaluate model-generated infrastructure and reliability engineering solutions. - Preferred: Experience supporting production-scale systems. Requirements - $400 per accepted task. Compensation is tied to accepted work. Application Process - Upload resume - AI interview based on your resume - Submit form Resources & Support - For details about the interview process and platform information, please check: Interview Process - For any help or support, reach out to: support@mercor.com - PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Junior DevSecOps Engineer
Multiplica TalentWe connect extraordinary talent with forward thinking companies.
• Diseñar, implementar y optimizar procesos de integración y despliegue continuo (CI/CD), infraestructura cloud y prácticas de seguridad. • Promover una cultura DevSecOps dentro de los equipos de desarrollo.
• Serve as a primary owner for the reliability, availability, performance, operability, and capacity of one or more production services • Deploy, operate, maintain, and continuously improve production services running in Autodesk GovCloud environments • Partner with engineering teams to ensure services are designed with reliability, scalability, security, and operability in mind • Define and operate reliability practices such as SLOs/SLIs, error budgets, production readiness reviews, service reviews, and operational health reviews • Build automation to improve deployment safety, operational efficiency, incident response, and service recovery • Design, develop, and maintain software, automation, and tooling that improve the reliability, scalability, and efficiency of production systems • Implement and improve monitoring, alerting, logging, tracing, and observability capabilities across supported services • Lead and participate in incident response, troubleshooting, and post-incident reviews focused on learning and continuous improvement • Develop and maintain operational documentation, runbooks, and recovery procedures • Scale and enhance resilience testing and Gameday practices to validate system behavior, recovery capabilities, and operational readiness • Continuously identify and eliminate operational toil through software engineering, automation, and process improvement • Ensure supported services remain compliant with Autodesk security, privacy, and regulatory requirements, including FedRAMP and related controls where applicable • Participate in a 24x7 on-call rotation for production services
• Serve as a primary owner for the reliability, availability, performance, operability, and capacity of one or more production services • Deploy, operate, maintain, and continuously improve production services running in Autodesk GovCloud environments • Partner with engineering teams to ensure services are designed with reliability, scalability, security, and operability in mind • Define and operate reliability practices such as SLOs/SLIs, error budgets, production readiness reviews, service reviews, and operational health reviews • Build automation to improve deployment safety, operational efficiency, incident response, and service recovery • Design, develop, and maintain software, automation, and tooling that improve the reliability, scalability, and efficiency of production systems • Implement and improve monitoring, alerting, logging, tracing, and observability capabilities across supported services • Lead and participate in incident response, troubleshooting, and post-incident reviews focused on learning and continuous improvement • Develop and maintain operational documentation, runbooks, and recovery procedures • Scale and enhance resilience testing and Gameday practices to validate system behavior, recovery capabilities, and operational readiness • Continuously identify and eliminate operational toil through software engineering, automation, and process improvement • Ensure supported services remain compliant with Autodesk security, privacy, and regulatory requirements, including FedRAMP and related controls where applicable • Participate in a 24x7 on-call rotation for production services • Function effectively in a fast-paced environment while helping establish and mature operational excellence practices for Autodesk GovCloud
• Partner with customers to decompose ambiguous goals into concrete, buildable AI use cases, uncovering hidden complexity and edge cases along the way. • Determine whether the data a use case needs is available, identify the right APIs or MCP sources, and secure access. • Use Gladly’s CLI to register APIs on the App Platform, making customer data accessible to Gladly AI and agents. • Write app actions in JavaScript to condense large API payloads down to the fields the AI actually needs. • Build the workflows and guides that tell Gladly’s AI how to use that information and respond to the customer. • Own use cases end to end after launch: monitor performance, optimize, and build new use cases that lift assist and resolution rates. • Give proactive status updates to customers and the internal team, and partner with SAMs and Implementation Managers to keep goals and timelines aligned. • Participate in QBRs and EBRs to show progress and ensure customers are getting measurable value. • Partner with Solutions Engineering on pre-sales demos, and pull in Professional Services Engineering for the most complex custom work.


