Apex Systems, an IT staffing and workforce solutions firm, provides recruiting and staffing services to large and small companies alike. Founded in 1995 by thre
Infrastructure Engineering - Systems Engineer III
Location
California
Posted
72 days ago
Salary
0
Seniority
Senior
No structured requirement data.
Job Description
Infrastructure Engineering - Systems Engineer III
Apex Systems
Open this listing to view full details.
Related Guides
Related Categories
Related Job Pages
More Infrastructure Engineer Jobs
Role Description Reports to the Senior Engineering Manager, Archiving & Data Services and works in the engineering team to help grow the Archiving & Data Services group’s suite of digital archiving, data, and access services for a global set of partner research, memory, and social good organizations. - Supporting and scaling our web archiving service (Archive-It). - Contributing to specific development projects for other department-managed services and systems. - Helping to build and improve foundational systems, services, and developer tools. - Operating on our self-owned and operated data centers. - Working to support products for a global coalition of institutions. Essential Duties & Responsibilities: - Lead projects across the full software development lifecycle: plan, design, implement, test, deploy, and maintain software solutions. - Engage in significant time overlaps with North America staff, working on Pacific Time. - Evolve system architectures to support future requirements, improve reliability, and reduce operational burden. - Triage, resolve, and address root causes of production issues. - Support engineering leadership with project coordination, strategic planning, and operational execution. - Work on our web archiving service (Archive-It), a service currently used by over 1000 partner organizations to archive and provide public access to web collections totaling many petabytes of data and that archives billions of files each year. - Build and improve foundational systems, services, and developer tools that support values-aligned products, at petabyte scale, operating on our self-owned and operated data centers ensuring open access to digital information for a global coalition of institutions. - Foster a culture of collaboration and learning, reinforce team processes that are working, and identify potential technology or workflow improvements. Qualifications - A commitment to the mission - to provide “Universal Access to All Knowledge”. - Knows how to work well with a team with diverse skills and is willing to share knowledge and be open to learning from others. - Ability to own production systems end to end (design, implementation, deployment, monitoring, and continuous improvement). - Knows how to work in a distributed team structure. - Able to attend some in-person team meetings (expected attendance two meetings per year). Requirements - Five (5)+ years of experience as a Software Engineer focused on infrastructure, DevOps, or site reliability. - Experience building and operating observability stacks. - Experience with Kubernetes in production environments. - Preferred: Technical leadership experience (mentoring, leading projects, design review). Benefits - Comprehensive benefits package, including PTO, paid holidays, medical, dental, and vision benefits. - Health savings and flexible spending accounts. - Commuter benefits. - Short-term and long-term disability coverage. - Retirement programs.
Software Engineer: Infrastructure
ThatchWe’re a fully distributed early stage company using technology to change the way America does healthcare. We’re a happy, friendly, high-velocity team.
About the role Thatch is rewiring how health benefits work. We build software that gives employees real control over how they use their benefits. We’re looking for a Software Engineer, Infrastructure to own the reliability, security, and performance of the platform that powers Thatch. Every product team depends on the infrastructure you design and operate. Our platform runs in a HIPAA-compliant environment and handles sensitive healthcare data. Reliability and security are foundational. The systems you build must be secure, observable, and resilient under real-world load. We organize engineering around product teams that serve members, employers, and partners. The Infrastructure team builds and operates the platform that enables them to ship reliably. What you will do - Own infrastructure health, performance, and reliability across the stack. - Make architectural decisions and improve how we build and operate. - Design and evolve infrastructure using Terraform. - Improve CI/CD, deployments, and developer workflows so teams ship faster. - Strengthen security across access controls, vulnerability management, and dependencies. - Identify and eliminate reliability risks before they impact customers. What we are looking for - Experience operating production infrastructure, platform, DevOps, or SRE systems. - A track record of owning critical systems end-to-end. - Strong experience with PostgreSQL or comparable relational databases in production. - Experience managing infrastructure as code using Terraform. - Sound judgment across reliability, performance, and security tradeoffs. - A bias toward automation and improving developer productivity. We hire engineers across multiple levels and care most about the scope you’ve owned, the impact you’ve had, and how you make decisions. Tools and tech stack - Ruby on Rails - PostgreSQL - Terraform - Modern cloud infrastructure and CI/CD tooling You don’t need experience with every tool above. You do need to be able to operate systems responsibly in a regulated environment. Experience that stands out - Experience operating in HIPAA or SOC 2 environments. - Deep PostgreSQL performance tuning or replication management. - Building internal tooling that improves developer productivity. - Improving observability, incident response, or overall security posture. What to expect We aim to move quickly, and most candidates complete the process within 2–3 weeks. The interview process typically includes: - An initial conversation with a recruiter. - A collaborative pair programming session. - A systems or infrastructure design discussion. - A conversation with members of the engineering team. - A final conversation with engineering and company leadership. Estimated Compensation Range $161,000—$230,000 USD About Thatch We’re a fully distributed early stage company using technology to change the way America does healthcare. We’re a happy, friendly, high-velocity team. You can read more on Thatch here.
First and most importantly: our mission is to bring transparency and clarity to the world's data. Our platform, FiftyOne, is where AI work happens. Our enterprise platform is the mission critical linchpin for managing unstructured data, model development, and AI systems at the world's largest companies. We believe that open source is the way to lead the data-centric AI revolution. Our open source version has 4 million downloads to-date. Our software massively impacts AI work across almost every vertical: from self-driving cars to medical imaging to revolutionizing agriculture, we are at the thrilling center of real-world AI advancement’s next wave. And we’re built on three key tenets: - We are all human beings: we strive to be a “human-first” organization and treat everyone with the respect, care, and flexibility that all people deserve. - We are distributed: we believe in getting autonomy and power into the hands of people actually doing the work. - We believe in the power of community. We are fully remote, hiring for people based in North America and who are prepared to travel to at least 2 in-person retreats per year. About your role As a Principal Infrastructure Engineer at Voxel51, you will shape the architecture and strategy of the systems that power our platform — from individual researchers to enterprise-scale deployments. You’ll lead the design of containerized systems, CI/CD pipelines, and deployment solutions across cloud and on-premises environments, while solving the unique challenges of serving unstructured data (images and video) at scale. You’ll partner with enterprise customers, guiding and troubleshooting their production deployments. You’ll collaborate across engineering teams to improve developer productivity, and mentor peers while setting infrastructure best practices. Your work will directly shape the reliability, security, and scalability of Voxel51’s platform — and accelerate our mission to democratize data-centric ML. What you will do - Shape the architecture and evolution of Voxel51’s infrastructure to support deployments ranging from individual researchers to Fortune 500 enterprises - Design, build, and scale deployment systems across cloud (GCP, AWS, Azure) and on-premises environments, ensuring reliability, security, and repeatability - Partner with enterprise customers (and our Customer Success Machine Learning Engineers) to deliver and support production-grade deployments in their environments, guiding them through installation, troubleshooting, and scaling - Lead infrastructure initiatives across engineering teams, enabling peers to develop, test, and ship features faster with robust internal tooling and automation - Drive best practices in CI/CD, evolving our pipelines (currently GitHub Actions + Google Cloud Build) and introducing new approaches where they add value - Develop and maintain deployment solutions for Voxel51-hosted environments (GKE) as well as customer on-prem installations (K8s or Docker Compose) - Champion developer productivity, improving workflows for development and automated cloud deployments - Troubleshoot and resolve complex infrastructure issues, spanning build failures, runtime failures, and customer deployment challenges - Anticipate and prevent failures by designing monitoring, alerting, and predictive solutions for both internal and customer environments - Mentor engineers and set technical direction, ensuring Voxel51’s infrastructure remains ahead of customer needs and industry trends What you should bring - Deep experience with containerized environments - Building, packaging, and debugging container images - Kubernetes (and Docker Compose) for orchestration - Building, maintaining, and deploying Helm charts - Infrastructure as Code expertise (Terraform, Ansible, or equivalent) - Scripting and automation skills (Bash or similar) - Python expertise, including build and environment management, packaging/distribution, release management, and dependency debugging - CI/CD systems experience, ideally GitHub Actions (we use this today) - Cloud infrastructure knowledge, especially GCP (IAM, VPC, load balancing, ingress/egress routing, proxies, firewall rules) - Database fundamentals, ideally MongoDB or similar NoSQL systems - Observability skills, including designing meaningful monitors, logging, tracing, and alerting - Security best practices, including certificates, service accounts, least privilege, and role assumptions - Troubleshooting ability across complex, distributed systems (including with customers in the loop) - Testing mindset: comfortable with designing and applying different types of tests to validate functionality - Strong communication skills, with the ability to work directly with enterprise customers as well as collaborate across teams in a remote-first, collaborative environment - Adaptability and curiosity, with the ability to ramp quickly on unfamiliar concepts and technologies The cash compensation for this person is in the $250K-$280K range. In addition to base comp for this role, we offer equity in the form of options, a variety of benefits, and the opportunity to grow in an exciting and collaborative environment.
• Shape the architecture and evolution of Voxel51’s infrastructure to support deployments ranging from individual researchers to Fortune 500 enterprises • Design, build, and scale deployment systems across cloud (GCP, AWS, Azure) and on-premises environments, ensuring reliability, security, and repeatability • Partner with enterprise customers to deliver and support production-grade deployments in their environments, guiding them through installation, troubleshooting, and scaling • Lead infrastructure initiatives across engineering teams, enabling peers to develop, test, and ship features faster with robust internal tooling and automation • Drive best practices in CI/CD, evolving our pipelines (currently GitHub Actions + Google Cloud Build) and introducing new approaches where they add value • Develop and maintain deployment solutions for Voxel51-hosted environments (GKE) as well as customer on-prem installations (K8s or Docker Compose) • Champion developer productivity, improving workflows for development and automated cloud deployments • Troubleshoot and resolve complex infrastructure issues, spanning build failures, runtime failures, and customer deployment challenges • Anticipate and prevent failures by designing monitoring, alerting, and predictive solutions for both internal and customer environments • Mentor engineers and set technical direction, ensuring Voxel51’s infrastructure remains ahead of customer needs and industry trends

