Omilia - Conversational Intelligence

Omilia is the leading provider of Natural Language Understanding enabled IVR & natural dialogue interaction solutions.

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 201-500Since 2002H1B No SponsorCompany Site LinkedIn

Location

Australia

Posted

7 days ago

Salary

Seniority

Senior

Bachelor DegreeEnglishAnsible AWS Cloud Docker Grafana Kubernetes Linux MySQL NoSQL PostgreSQL Prometheus Python RDBMS Redis TCP/IP Terraform VoIP Go

Job Description

- Ensure platform reliability and availability across production and pre-production environments through proactive monitoring, alerting, and automation. - First response for incidents, contribute to problem management and root cause analysis. - Supporting the development team's effort towards reliability, creating a solid reliability culture within the development lifecycle. - Develop troubleshooting documentation for production support resources. - Collaborate with Engineering teams to develop optimised and productive runbooks, operational documentation and automation of operational tasks. - Collaborate with development and cloud engineering teams to embed reliability and performance into the software delivery lifecycle. - Design, implement, and evolve observability solutions (metrics, logs, traces, dashboards) using tools such as Prometheus, Grafana, and ELK. - Participate in on-call rotations and continuously improve alert quality and response processes. - Champion a culture of reliability, performance, and continuous improvement across teams.

Job Requirements

Bachelor's Degree or MS in Engineering or equivalent.
Experience in operating at least one container orchestration cluster (Kubernetes, Docker Swarm).
Experience developing or maintaining software for production services at scale.
Experience with ELK.
Experience with AWS.
Experience with Grafana/Prometheus stack.
Strong scripting skills (Bash, Python or Go).
Excellent communication skills.
Thinking out of the box and anticipating challenges. It is imperative we are not simply reactive; we must expect challenges and question technologies, procedures and thinking already in place. You will be expected to constantly review and challenge at all levels.
Versatility. We work with agile/lean methods. We'd much rather iterate and learn than assume we know all the answers.
Being a team player. You don't (always) work in isolation and are excited by the thought of using your team whilst involving product, experience design, engineering, and more in the process.
Will be considered as a plus:**
Telephony knowledge (SIP, VoIP);
Experience in Linux Administration (RedHat, CentOS, AL);
Working knowledge in Configuration Management tools (Terraform, Ansible);
Experience with TCP/IP and general networking concepts;
RDBMS knowledge (MySQL, Postgres);
NoSQL knowledge (Redis).

Benefits

Fixed compensation;
Long-term employment with the working days vacation;
Development in professional growth (courses, training, etc);
Being part of successful cutting-edge technology products that are making a global impact in the service industry;
Proficient and fun-to-work-with colleagues;
Apple gear.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

DevOps Engineer

ZIRO

Make IT Hassle-Free

DevOps Engineer7 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

Role Description This is a hands-on, get-it-done engineer who keeps ZIRO’s software running smoothly from code to production and everything in between. You take complex, inherited systems and turn them into clean, scalable, well-documented infrastructure that teams actually love to use. You build and improve CI/CD pipelines, monitoring, and release processes that make shipping software fast, safe, and repeatable. At the end of the day, your work empowers engineers to move quickly with confidence and ensures our platforms are stable, reliable, and ready for anything. What you will do - Build, Scale and Maintain a Consistent and repeatable release process for products managed by the R&D Ops team - Build and maintain infrastructure to support ZIRO’s SaaS products with high availability (failover/disaster recovery) - Plan and maintain infrastructure to support CI/CD pipeline and automated testing - Writing, updating documentation such as runbooks/playbooks for operations to follow - Point of escalation for operations and engineering - Provide support and escalation for CI/CD pipeline - Build and Maintain monitoring and management systems for systems managed by the R&D Ops team Qualifications - 5–8+ years in DevOps, SRE, or Infrastructure roles supporting SaaS or enterprise platforms - Strong experience designing and maintaining CI/CD pipelines, automated testing, and release workflows - Deep knowledge of cloud infrastructure (Azure/AWS) including high availability and disaster recovery - Hands-on experience with Infrastructure as Code tools (e.g., Terraform, Ansible) and system automation - Proven ability to troubleshoot production issues, build monitoring/alerting systems, and partner closely with engineering and ops teams Benefits - Flexible, take what you need PTO 🏖️ - Competitive wages 💵 - Company sponsored health, vision and dental plans ⚕️ - Fully remote roles 💻 - Home office budget 📎 - Company sponsored social events 🤩 Company Description ZIRO is a leader in Unified Communications, helping customers deliver modern voice through Teams Phone and Microsoft 365. We help companies migrate, automate and manage their phone systems with industry-leading technology and decades of expertise with enterprise calling. Our platform simplifies the complex, helps IT teams move faster, and deliver a unified experience to every user. We’re a growing, people-first team that values accountability, courage, innovation, passion, selflessness and good judgment. We are on the mission to make every conversation count, and you can be too!

View details: DevOps Engineer

Worldwide

Apply

Head of Data Ops & Systems

Delivery Associates

Delivery Associates (DA) is a global social impact consultancy specializing in transforming the public sector. With over a decade of experience and a foundation in Deliverology®, DA guides leaders through every phase of implementation—delivery-oriented strategy formulation, execution, and evaluation. DA partners with governments, philanthropies, and international organizations, employing actionable strategies to achieve sustainable, lasting outcomes.

DevOps Engineer7 days ago

Full Time RemoteTeam 51-200

Role Description Delivery Associates (DA) is seeking a Head of Data Ops and Systems to lead and strengthen our internal technology infrastructure, data capability, and systems operations. This is a senior leadership role responsible for ensuring DA's internal systems are fit for purpose, well-integrated, and built on a foundation of strong data architecture and governance. This is an internally focused role. The successful candidate will be responsible for keeping DA’s data and technology foundations reliable and evolving, leading and working alongside platform engineers, full stack developers, and data teams to build the infrastructure and data architecture that underpins how we operate globally. Key Responsibilities - Systems & Infrastructure - Own DA's systems strategy, architecture, and investment direction, ensuring infrastructure, platforms, and security enable reliability and scale. - Lead a lean technical team, ensuring the infrastructure and systems that support DA’s global operations are built, maintained and continuously improved. - Lead technology architecture and platform decisions, balancing operational resilience with the need to evolve systems over time. - Ensure appropriate cybersecurity governance is in place, including oversight of controls, access policies, data protection and compliance requirements. - Manage system performance, uptime, and incident response, ensuring operational continuity across geographies. - Manage relationships and contracts with technology vendors and system providers, ensuring DA receives value, appropriate service levels, and contract compliance. - Maintain and improve IT documentation, processes, and standards across the organisation. - Systems Integration - Lead the integration strategy across DA's internal platforms and tools, ensuring systems connect effectively and data flows reliably between them. - Identify and resolve integration gaps that create friction, duplication, or data inconsistencies across the organisation. - Evaluate and recommend systems changes or replacements where current tools are not fit for purpose. - Translate operational needs from across the business into clear systems requirements and solutions. - Data Architecture, Quality & Governance - Own and evolve DA’s data architecture, designing and maintaining the structures, pipelines and infrastructure through which data flows across the organisation. - Own DA's internal data quality agenda, defining standards, building and managing processes to identify and resolve data issues, and embedding accountability for data quality across teams. - Develop and maintain a data governance framework, including data definitions, ownership, lineage, and access policies. - Provide strategic direction and oversight of data science work, ensuring data is structured, analysed and used effectively across the organisation. - Lead data modelling, data mining and data extraction approaches to improve the quality and usability of DA’s internal data assets. - Act as DA’s Data Protection Officer (DPO), taking responsibility for data protection obligations and ensuring compliance with relevant regulations across global jurisdictions. - Governance, Strategy & Execution - Develop and maintain a clear technology and data roadmap with defined priorities, milestones, and success metrics. - Establish strong technology governance, including decision rights, escalation paths, and investment discipline. - Ensure alignment between IT investments, systems, and broader organisational strategy. - Identify, evaluate and manage technology and data risk at an organisational level, ensuring risks are understood, prioritised and addressed effectively across DA’s systems and data landscape. - Lead a portfolio of change initiatives across data and systems, managing interdependencies, sequencing work effectively, and maintaining momentum across multiple concurrent priorities. - Drive organisational change effectively, managing stakeholder expectations, communicating progress clearly, and ensuring new ways of working are embedded across the organisation. - Translate strategy into execution through clear plans, routines, and accountability. - Leadership & Collaboration - Build, lead, and develop a high-performing data and systems team, setting clear expectations and creating the conditions for accountability and growth. - Work closely with senior leaders and internal stakeholders across DA's practices and regions to understand needs and deliver effective solutions. - Communicate complex technical topics clearly and confidently to non-technical audiences. - Actively seek ways to innovate across data, systems, and technology, and support delivery teams across DA to do the same. - Foster a culture of reliability, continuous improvement, and high operational standards. Qualifications - Bachelor's degree in computer science, engineering, information systems, data science, or a related field. Extensive professional experience may be considered in place of formal education. - Significant senior experience leading data operations, data architecture, and systems functions in global organisations. - Strong background in data science and data governance, including data quality management, data modelling and governance frameworks. - Proven experience designing and evolving data architecture, pipelines, and infrastructure. Experience leading and managing teams of platform engineers and developers. - Proven experience managing and improving integrations across enterprise platforms and tools, with a solid grounding in IT systems, infrastructure, and security. - Significant experience in data protection with a strong working knowledge of data protection obligations across global jurisdictions. - Proven ability to establish technology and data governance, manage risk, and translate strategy into executable roadmaps. - Demonstrated experience managing a portfolio of technology and data initiatives, delivering organisational change, and managing stakeholder expectations across complex, fast-paced environments. - Excellent written and verbal communication skills, with the ability to engage effectively with senior leaders and non-technical audiences. Fluency in English is required; additional languages are a plus. Benefits - Opportunities to engage with senior leaders and contribute to an organisation delivering meaningful social outcomes globally. - Work with diverse teams across geographies. - Access to world-class training programmes and mentorship from senior leaders. - Be part of a growing organisation committed to continuous improvement. - Remote working environment with opportunities for global collaboration. - Comprehensive benefits package for the effort they put in. Company Description Delivery Associates (DA) is a global social impact consultancy specialising in transforming the public sector. We help governments, philanthropies, and social impact organisations turn ambitious goals into an everyday reality for the people they serve. At DA, we value excellence, humility, and passion, ensuring that we deliver the highest quality work while staying committed to making a meaningful difference.

View details: Head of Data Ops & Systems

Ireland

Apply

Senior Site Reliability Engineer

Palta

Health & well-being tech company led by entrepreneurs on a mission to create a positive impact globally.

DevOps Engineer7 days ago

Full Time RemoteTeam 501-1,000Since 2016H1B No Sponsor

Company Site LinkedIn

• You will be working day to day on AWS, our infrastructure-as-code, our CI/CD setup, observability, and the on-call rotation. • A meaningful part of the work is automation: when we find ourselves doing the same thing twice, we usually invest in tooling rather than writing another runbook. • Most of that tooling is written in Go. • Infrastructure is defined with Terraform and Terramate, with Atlantis running plan and apply on pull requests. • Workloads run on EKS with Karpenter and Fargate, deployed through ArgoCD. • Observability is built on Grafana, Loki, Tempo, and Prometheus compatible metrics.

AWS Flux Grafana Kubernetes Prometheus Terraform Go

View details: Senior Site Reliability Engineer

Cyprus

Apply

Principal Site Reliability Engineer - Networking

Elastic

Self-described as the leading platform for search-powered solutions, Elastic helps organizations, their customers, and their employees find what they need faster while protecting a

DevOps Engineer7 days ago

Full Time Remote

Title: Principal SRE (Networking) - Platform Control Plane Location: Remote - United States Job Description: Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale — unleashing the potential of businesses and people. The Elastic Search AI Platform, used by more than 50% of the Fortune 500, brings together the precision of search and the intelligence of AI to enable everyone to accelerate the results that matter. By taking advantage of all structured and unstructured data — securing and protecting private information more effectively — Elastic’s complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI. As part of the Platform Engineering department, the Network Infrastructure team is crafting, building, and improving the multi-cloud platform at scale for Elastic Cloud Hosted and Serverless. We grow and mature our distributed large-scale network infrastructure that spans across multiple cloud service providers to support our cloud services. We are built on Kubernetes, Go, and custom orchestration architectures. In your daily life with us, you will participate in coding, innovating technical designs, crafting solutions, improving resilience, and prioritizing security, bug fixes, and features. For example, Debugging Azure Networking for Elastic Cloud Serverless is part of our efforts, and we want your experience to contribute to a truly exceptional customer experience! - Taking an engineering approach in leading technical initiatives for designing, building and automating network infrastructure and services to guarantee the reliability of the global Elastic network infrastructure. Focusing on Layer 2/3/4 of the TCP/IP stack (Ethernet and/or IP encapsulation, routing, firewalling, load balancing). - Growing our global Platform network infrastructure to meet the increasing scaling demands by developing and maintaining software, codebases, tooling and automations to serve our Network Infrastructure as Code principle. - Collaborating in an environment with an inclusive approach, and focusing on operational excellence which uplifts others. - Preventing repeated customer impact in response to major incidents and prioritised problem management. Our on call rotation is spread well, and we address complex customer concerns too. - Excellent networking skills, with knowledge of protocols such as IP/IPv6, TCP/UDP, BGP, DNS. - Strong technical depth for building and automating networks (Terraform, Ansible) in collaboration with other engineers as an authority in identifying, implementing and delivering solutions. - Good knowledge of public CSP network components (Load balancers, VPC peering/Transit gateways, VPN connectivity, Direct Connects). - Success and lessons of experiences from striving for 'progress not perfection' in the name of Platform reliability. We want to hear about your customer-first approach in solving operational problems for both today and the future. - Passion for developing solutions that involve inclusive communication methods to grow and strengthen partner and team relationships. Examples of working in distributed teams or working remotely is desirable. - Site-Reliability Engineering experience. We tackle problems with code, but fundamentally we keep things working and have proven success in operational excellence. Responding to and preventing repeated customer impact in response to major incidents and prioritized problem management. Our on call rotation uses a follow-the-sun model where everyone participates in it in (mostly) their working hours. - You have operated a SaaS product in a public cloud ideally built using Infrastructure-as-Code tooling such as Crossplane or Terraform. - You have designed and/or operated large network topologies that dynamic routing is based on BGP. - You have operated network topologies based on software routers. - You have experience in IP address management (IPAM) and you have used relevant tools for automated IP allocations. - You have designed and/or operated overlay networks with use of encapsulation protocols such as IPSec, GRE and VXLAN. - You have built or operated a Kubernetes-at-scale infrastructure, ideally across multiple cloud providers, with knowledge of the Cilium CNI. - You have written non-trivial programs in Golang or other programming languages. - You have worked with containerized services (such as Docker). - You have proven experience in leading and improving alerting and major incident management standard processes metrics systems (e.g. Elastic Stack, Graphite, Prometheus, Influx) to diagnose issues and quantify impacts to present to others at varying levels of the organization. - You have experience in system and network administration with professional skills in Linux on distributed systems at scale. - You have diagnosed or designed, implemented and created solutions with the Elastic Stack. - You are experienced in thriving in a self-organizing and sharing in a globally distributed team environment. - You strengthen team members in bringing out the best of each other by uplifting others with coaching and mentoring. As a distributed company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life. Your age is only a number. It doesn’t matter if you’re just out of college or your children are; we need you for what you can do. We strive to have parity of benefits across regions, and while regulations differ from place to place, we believe taking care of our people is the right thing to do. - Competitive pay based on the work you do here and not your previous salary - Health coverage for you and your family in many locations - Ability to craft your calendar with flexible locations and schedules for many roles - Generous number of vacation days each year - Increase your impact - We match up to $2000 (or local currency equivalent) for financial donations and service - Up to 40 hours each year to use toward volunteer projects you love - Embracing parenthood with a minimum of 16 weeks of parental leave Security & Privacy Responsibilities: Take ownership of protecting the confidentiality, integrity, and availability of organizational data and systems by following applicable privacy and security policies, standards, and procedures. Ensure that all individual contributions follow Elastic’s Secure Software Development Framework (SSDF). Proactively participate in mandatory role-based training to ensure personal technical execution consistently aligns with the highest standards of data protection, data privacy, and system resilience. Different people approach problems differently. We need that. Elastic is an equal opportunity employer and is committed to creating an inclusive culture that celebrates different perspectives, experiences, and backgrounds. Qualified applicants will receive consideration for employment without regard to race, ethnicity, color, religion, sex, pregnancy, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, disability status, or any other basis protected by federal, state or local law, ordinance or regulation. We welcome individuals with disabilities and strive to create an accessible and inclusive experience for all individuals. Elasticsearch develops and distributes technology and information that is subject to U.S. and other countries’ export controls and licensing requirements for individuals who are located in or are nationals of the following sanctioned countries and regions: Belarus, Cuba, Iran, North Korea, Syria, or Russia, including the Ukrainian territories annexed by Russia (The Crimea region of Ukraine, The Donetsk People's Republic (DNR), The Luhansk People's Republic (LNR), Kherson or Zaporizhzhia). If you are located in or are a national of one of the listed countries or regions, an export license may be required as a condition of your employment in this role. Please note that national origin and/or nationality do not affect eligibility for employment with Elastic. Please see here for our Privacy Statement. Compensation for this role is in the form of base salary. This role does not have a variable compensation component. The typical starting salary range for new hires in this role is listed below. In select locations (including Seattle WA, Los Angeles CA, the San Francisco Bay Area CA, and the New York City Metro Area), an alternate range may apply as specified below. These ranges represent the lowest to highest salary we reasonably and in good faith believe we would pay for this role at the time of this posting. We may ultimately pay more or less than the posted range, and the ranges may be modified in the future. An employee's position within the salary range will be based on several factors including, but not limited to, relevant education, qualifications, certifications, experience, skills, geographic location, performance, and business or organizational needs. Elastic believes that employees should have the opportunity to share in the value that we create together for our shareholders. Therefore, in addition to cash compensation, this role is currently eligible to participate in Elastic's stock program. Our total rewards package also includes a company-matched 401k with dollar-for-dollar matching up to 6% of eligible earnings, along with a range of other benefits offered with a holistic emphasis on employee well-being. The typical starting salary range for this role is: $179,800—$232,900 USD The typical starting salary range for this role in the select locations listed above is: $179,800—$232,900 USD

View details: Principal Site Reliability Engineer - Networking

Worldwide

$179.8K - $232.9K / year

Apply

Senior Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevOps Engineer

Head of Data Ops & Systems

Senior Site Reliability Engineer

Principal Site Reliability Engineer - Networking