Principal Software Engineer – SRE
Location
United States
Posted
83 days ago
Salary
$131K - $185K / year
Seniority
Lead
Job Description
Principal Software Engineer – SRE
PTC
• Own Reliability at Scale • Lead design, implementation, and evolution of reliability, availability, and resiliency strategies for large-scale distributed systems • Identify systemic risks in application architecture, data flows, and infrastructure • Drive operational excellence by preventing, detecting, and mitigating incidents • Apply advanced software engineering practices to eliminate manual work and improve system observability • Partner with product engineers and engineering leadership to ensure reliability in system design • Contribute to longer-term reliability and infrastructure strategy aligned with business growth
Job Requirements
- US Citizenship or Permanent Residents only due to ITAR requirements
- Ability to work east coast (EST) hours
- 10+ years of experience in software engineering, site reliability engineering, or systems engineering roles
- Extremely strong proficiency with the Java programming language and its ecosystem
- Deep experience operating complex, distributed systems in production environments
- Strong software engineering background, with a track record of delivering high-quality, maintainable code
- Expert understanding of incident management, service reliability, and performance engineering
- Strong hands-on experience with observability (metrics, logs, traces), capacity planning, and SLO-driven reliability
- Deep familiarity with modern cloud-based infrastructure, CI/CD pipelines, and infrastructure-as-code practices
- Comfortable making high-impact technical decisions in ambiguous environments
Benefits
- Medical, dental and vision insurance
- Paid time off and sick leave
- Tuition reimbursement
- 401(k) contributions and employer match
- Flexible spending accounts
- Life insurance
- Disability coverage
- Commuter subsidy
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Design, implement, and maintain observability solutions across Azure (e.g., Azure Monitor, Grafana-Prometheus, Dynatrace, Elastic). • Define and standardize SLIs/SLOs/SLAs to measure service health and customer experience. • Develop dashboards and automated alerting to proactively identify service degradations. • Build and maintain on-call runbooks and playbooks to reduce time-to-resolution. • Drive post-incident “blameless” retrospectives and continuous improvement initiatives. • Develop automation for self-healing systems, monitoring remediation, and incident mitigation. • Contribute to disaster recovery and business continuity planning across multi-cloud platforms. • Work with engineering teams to design for resiliency, scalability, and reliability from the ground up. • Partner with security, network, and system engineering teams to ensure observability integrates with compliance and governance frameworks. • Advocate for best practices in cloud-native reliability engineering. • Mentor engineering staff in observability tools, monitoring strategies, and incident management.
• Assist in the design and implementation of Infrastructure as Code, automated environment provisioning, and automated deployment and monitoring of multiple environments. • Pair with development and QA teams to enable a continuous integration environment that sustains high productivity levels and emphasizes defect prevention techniques. • Participate in a 24/7 on-call rotation. • Automate and optimize processes to reduce toil and perform sensitive tasks safely and reliably in secure environments. • Work with cross-functional business teams to understand requirements and other performance SLOs and SLAs. • Thoroughly document operational practices and procedures. • Troubleshoot and resolve issues in development, test and production environments. • Identify gaps in processes and help to close them. • Participate in other projects and duties as assigned.
• Conduct training for engineering teams • Troubleshoot issues in systems • Enable automation of configuration management • Recommend software deployment strategies • Build and optimize automation systems • Support implementation of infrastructure solutions • Conduct research and implement metrics collection systems
Senior DevOps Engineer, eHealth Sector
Sigma Software GroupWe support enterprises, product houses, and startups with custom software solutions development and IT consulting.
• Design, implement, and maintain scalable infrastructure on Azure, focusing on AKS at scale • Manage and optimize PostgreSQL (Azure managed databases) for high-volume workloads • Configure and maintain Kafka clusters for large-scale messaging and streaming • Implement GitOps workflows using ArgoCD for automated deployments • Develop and maintain Infrastructure as Code with Terraform • Create and manage Helm charts for microservices deployment • Build and optimize CI/CD pipelines using GitLab CI • Ensure security and compliance standards are met across all infrastructure and deployment processes • Collaborate closely with development teams to integrate new features into production • Monitor, troubleshoot, and improve infrastructure performance and reliability



