Job Closed
This listing is no longer active.
Your Arizona connection to news that matters.
Site Reliability Engineer
Location
New York
Posted
172 days ago
Salary
$110K - $130K / year
Seniority
Senior
Job Description
Site Reliability Engineer
Copper Courier
• Oversee current website operations and lead upcoming migration projects, ensuring zero-downtime transitions and optimal performance • Manage and optimize our production cloud infrastructure footprint, implementing secure, cost-effective scaling strategies while maintaining high availability • Design and implement automated environment management systems for publishing workflows and data management processes • Refine and implement monitoring and alerting of production environments based on service availability and user experience measurement. Diagnose and remediate production impacting issues in a timely manner. • Maintain comprehensive Identity and Access Management (IAM) systems, implement Single Sign-On (SSO) solutions, and security best practices across all enterprise and production platforms • Work closely with cross-functional technical and business teams, providing transparent communication and exceptional internal customer service to support business objectives • Provide guidance to external MSP to ensure colleagues are receiving consistent high quality technical support • Research, plan, and implement strategic enhancements to existing platforms and workflows, including emerging technologies such as GenAI solutions to improve operational efficiency and user experience
Job Requirements
- 5+ years of demonstrated experience in all of the above areas
- Experience leading and mentoring technical teams
- Experience with infrastructure-as-code tools and practices
- Background in DevOps methodologies and CI/CD pipeline management
- Knowledge of monitoring and alerting systems for proactive incident management
- Experience with containerization technologies (Docker, Kubernetes)
- Understanding of security best practices and compliance requirements
- Previous experience in a large scale production environment
- Proficiency in a scripting language such as python, javascript, or bash
Benefits
- Unlimited Paid Time Off (PTO) + 11 paid holidays a year
- An Employee Assistance Program (EAP)
- Full suite of health benefits
- 401k
- Flexible working arrangements
- $500 home office setup reimbursement
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Network Reliability Engineer
GovCIOGovCIO is a service-disabled-veteran-owned small business (SDVOSB) that offers technology services to improve business performance for government organizations. Headquartered in Fa
• Support, maintain, optimize, monitor, and participate in troubleshooting efforts for a mature network environment for a large Government Agency. • Perform Incident Response support by monitoring ticket system and taking appropriate actions on network related Incidents. • Manage, maintain, and support existing SolarWinds instance which monitors and generates alerts for network equipment. • Manage, maintain, and support existing interactive webpage dashboards through programming and database calls to SolarWinds and various other systems utilizing SWQL Studio. • Participate in troubleshooting sessions involving connectivity issues for infrastructure and applications by using SolarWinds and/or analyzing packet capture data, syslogs and related log data. • Use network diagnostic tools to reduce outage times by quickly identifying network related anomalies and issues. • Generate root cause analysis for issues diagnosed on the network. • Use network diagnostic tools to proactively address network anomalies and issues as well as optimize network performance. • Develop and maintain technical documentation as it pertains to the systems being managed. • Ensure compliance with security standards, policies, and best practices for IT systems and data protection. • Train and mentor internal teams on efficiencies gained by utilizing SolarWinds to monitor and diagnose network anomalies and issues.
• Design, build, and maintain MLOps and DevOps infrastructure on Azure • Develop and optimise ML pipelines for deployment, monitoring, and governance • Work with Azure Databricks, MLflow, and Unity Catalog • Implement CI/CD pipelines and automated ModelOps workflows • Ensure data architecture supports governance, lineage, and schema evolution • Apply Infrastructure as Code using Terraform • Collaborate closely with AI engineers and data teams to support production ML systems • Monitor and ensure platform stability, performance, security, and compliance • Support operational readiness of ML workloads in regulated environments
• Beheren, ontwikkelen en onderhouden van het opslaglandschap, inclusief moderne softwareoplossingen en appliances • Bijdragen aan operationele taken, innovatieve projecten en structurele verbeteringen • Streven naar automatisering, efficiëntie, beveiliging en schaalbaarheid in alle aspecten van de functie • Zorgen voor optimale prestaties, bewaking en rapportage van het IP Storage platform • Verstrekken van technische informatie en advies aan klanten • Kostenefficiënte oplossingen implementeren en de technische en economische levenscyclus van het platform beheren • Bijhouden van nauwkeurige documentatie en administratie • Samenwerken met Cloud-, Hosting-, Connectivity-, Backup- en Klantenteams binnen KPN
• Reduce the time it takes to provision infrastructure • Create runbooks, scripts, and unit tests to reduce manual labor on repetitive tasks • Work with the other IT teams to resolve how we can improve their existing workflows with the use of automation • Work across the organization to advise and influence improved cloud adoption and governance • Maintain, improve, and support the tools and platforms that are managed by the team • Author and maintain puppet roles and profiles to automate the configuration of our Windows and Linux servers • Influence reliable, efficient, scalable enterprise grade solutions across the organization • Share your knowledge and expertise with peers by documenting your work and leading and organizing brown bag sessions • Implement standard methodologies for systems automation and platform operations • Ensure the availability and security of the tools and platforms supported by the team




