Kohl's logo
Kohl's

It’s no secret that our associates love #LifeAtKohls and we know you will too.

Reliability Engineer

EngineerEngineerOtherRemoteMid LevelTeam 10,001+Since 1962H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

12 days ago

Salary

0

Seniority

Mid Level

Job Description

Reliability Engineer

Kohl's

Role Description As Reliability Engineer, you will ensure the resilience and availability of Kohl’s systems and applications and collaborate closely with development teams to review designs, conduct risk assessments and implement robust monitoring and failover mechanisms. - Drive incident response efforts, perform root cause analysis and implement preventative measures to enhance system reliability. - Establish consistent practices that elevate Kohl’s operational excellence through automation and process improvements. - Follow software lifecycle and drive reliability, observability and efficiency across product teams within an assigned domain. - Identify repeated toil and find opportunities for automation and risk reduction. - On-call on a rotation to respond to production incidents and conduct blameless retros and root-cause analyses (RCAs) to drive a culture of continuous improvements. - Proactively identify failures before they cause outages using chaos engineering techniques such as edge cases, failure modes and design review. - Advise on capacity planning and provide continuous assessments on systems behavior and consumption. - Work with product managers to identify and prioritize work for reliability best practices (i.e., leveraging SLIs/SLOs/Error Budgets). - Additional tasks may be assigned. Qualifications - Bachelor's Degree or equivalent in MIS, Computer Science or related field. - 2+ years of experience in software development. - Strong programming skills in one or more languages (Java, Python, Go or Node.js). - Working knowledge of systems architecture, operating system internals and network fundamentals. - Experience working with one cloud platform (e.g., GCP, AWS, or Azure). Requirements - Experience with monitoring techniques and tools (e.g., CloudWatch, Grafana, Prometheus, OpenTelemetry, Tracing). - Working knowledge around containerization and container orchestration (e.g., Docker, Kubernetes, Rancher).

Related Categories

Related Job Pages

More Engineer Jobs

Full TimeRemoteTeam 51-200H1B No Sponsor

DIGIT is seeking a System Administrator to manage and maintain printers throughout the enterprise for the General Services Administration. The individual will support the management of devices throughout their entire lifecycle. This includes administering, operating, maintaining, and supporting enterprise device management platforms, tools, systems, and related integrations used to manage network-connected devices. There are over 1,400 printers in the GSA environment that are deployed throughout the country. As a leading provider of advanced information technology solutions and professional services to U.S. federal government agencies, is the prime for an $807m task order in support of the General Services Administration (GSA) Office of Digital Infrastructure Technologies (IDT) DIGIT (Digital Innovation for GSA Infrastructure Technologies) task order driving digital transformation and delivering continuous improvement and business value to its customers. The team is comprised of the best-in-class technology partners to leverage forward-leaning technologies and best practices to transform GSA’s IT capabilities and shift offerings to provide a more flexible service delivery model, completing the agency’s shift to a fully digital experience along with its adoption of advanced, emerging technologies such as intelligent automation, artificial intelligence, and machine learning.

District Of Columbia + 1 moreAll locations: District Of Columbia | Washington
Job Closed

Role Description Performacentric is seeking a Machine Learning Engineer with hands-on experience developing and deploying AI applications using Llama 3 8B, Python, and FastAPI. This role will be responsible for building production-grade AI services, optimizing model performance, developing APIs, integrating business systems, and supporting the evolution of Performacentric's AI agent platform. The ideal candidate combines strong software engineering skills with practical machine learning experience and enjoys working in a fast-paced startup environment where they can directly influence product direction and technical architecture. Responsibilities - AI Model Development & Optimization - Deploy, configure, and optimize Llama 3 8B models for production use. - Develop prompt engineering, retrieval, and agentic workflows. - Fine-tune and evaluate LLM performance for business use cases. - Implement Retrieval-Augmented Generation (RAG) architectures. - Optimize inference performance, latency, and infrastructure utilization. - Monitor model quality and continuously improve response accuracy. - Application Development - Build scalable AI applications using Python and FastAPI. - Design and maintain RESTful APIs for AI services. - Develop backend services supporting AI agents and copilots. - Integrate AI solutions with CRM, ERP, communication, and business systems. - Implement authentication, authorization, and API security controls. - Write clean, maintainable, and well-documented code. - Data & Infrastructure - Build and maintain vector database integrations. - Develop data ingestion and preprocessing pipelines. - Support deployment of AI workloads in cloud and self-hosted environments. - Collaborate on model serving, monitoring, logging, and observability. - Assist with infrastructure automation and CI/CD processes. - Collaboration - Work closely with product, engineering, and leadership teams. - Participate in architecture discussions and technical planning. - Contribute to AI solution design for client implementations. - Mentor junior developers and share best practices. Qualifications - 3+ years of professional software engineering experience. - Strong proficiency in Python. - Experience building APIs with FastAPI. - Experience deploying and working with Llama 3 8B or similar open-source LLMs. - Understanding of prompt engineering and LLM optimization techniques. - Experience consuming and developing REST APIs. - Strong understanding of Git-based development workflows. - Familiarity with Linux environments and command-line tools. - Experience troubleshooting and optimizing production applications. Requirements - Understanding of machine learning fundamentals. - Experience evaluating AI model performance. - Familiarity with embeddings, vector search, and RAG architectures. - Knowledge of model inference optimization techniques. - Experience working with structured and unstructured datasets. Preferred Qualifications - Fine-tuning open-source LLMs. - ML Engineering and MLOps practices. - LangChain, LlamaIndex, Haystack, or similar frameworks. - PostgreSQL database administration and optimization. - Vector databases such as pgvector, Chroma, Pinecone, Weaviate, or Qdrant. - Docker and containerized deployments. - Kubernetes orchestration. - Azure AI infrastructure and GPU environments. - CI/CD pipelines and DevOps automation. - Multi-agent AI architectures. - Knowledge graph implementations. - Business intelligence and analytics platforms. Success Metrics - Deploy and optimize production AI workloads. - Improve AI response quality and accuracy. - Reduce inference latency and infrastructure costs. - Expand Performacentric's AI agent platform capabilities. - Deliver reliable AI integrations for customer environments. - Contribute to the development of new AI-powered products and services. Benefits - Opportunity to work on cutting-edge AI and agentic technologies. - Direct influence on product architecture and technical strategy. - Remote-first work environment. - Competitive compensation based on experience. - Professional growth opportunities in one of the fastest-growing areas of software development. - Ability to help shape the future of AI-powered business transformation. How to Apply - Resume/CV - Brief cover letter - GitHub profile (if available) - Portfolio of AI, machine learning, or software development projects - Examples of LLM, FastAPI, or AI agent implementations (preferred)

USA Timezones

Role Description Performacentric is seeking a Machine Learning Engineer with hands-on experience developing and deploying AI applications using Llama 3 8B, Python, and FastAPI. This role will be responsible for building production-grade AI services, optimizing model performance, developing APIs, integrating business systems, and supporting the evolution of Performacentric's AI agent platform. The ideal candidate combines strong software engineering skills with practical machine learning experience and enjoys working in a fast-paced startup environment where they can directly influence product direction and technical architecture. Responsibilities - AI Model Development & Optimization - Deploy, configure, and optimize Llama 3 8B models for production use. - Develop prompt engineering, retrieval, and agentic workflows. - Fine-tune and evaluate LLM performance for business use cases. - Implement Retrieval-Augmented Generation (RAG) architectures. - Optimize inference performance, latency, and infrastructure utilization. - Monitor model quality and continuously improve response accuracy. - Application Development - Build scalable AI applications using Python and FastAPI. - Design and maintain RESTful APIs for AI services. - Develop backend services supporting AI agents and copilots. - Integrate AI solutions with CRM, ERP, communication, and business systems. - Implement authentication, authorization, and API security controls. - Write clean, maintainable, and well-documented code. - Data & Infrastructure - Build and maintain vector database integrations. - Develop data ingestion and preprocessing pipelines. - Support deployment of AI workloads in cloud and self-hosted environments. - Collaborate on model serving, monitoring, logging, and observability. - Assist with infrastructure automation and CI/CD processes. - Collaboration - Work closely with product, engineering, and leadership teams. - Participate in architecture discussions and technical planning. - Contribute to AI solution design for client implementations. - Mentor junior developers and share best practices. Qualifications - 3+ years of professional software engineering experience. - Strong proficiency in Python. - Experience building APIs with FastAPI. - Experience deploying and working with Llama 3 8B or similar open-source LLMs. - Understanding of prompt engineering and LLM optimization techniques. - Experience consuming and developing REST APIs. - Strong understanding of Git-based development workflows. - Familiarity with Linux environments and command-line tools. - Experience troubleshooting and optimizing production applications. Machine Learning Knowledge - Understanding of machine learning fundamentals. - Experience evaluating AI model performance. - Familiarity with embeddings, vector search, and RAG architectures. - Knowledge of model inference optimization techniques. - Experience working with structured and unstructured datasets. Preferred Qualifications - Fine-tuning open-source LLMs. - ML Engineering and MLOps practices. - LangChain, LlamaIndex, Haystack, or similar frameworks. - PostgreSQL database administration and optimization. - Vector databases such as pgvector, Chroma, Pinecone, Weaviate, or Qdrant. - Docker and containerized deployments. - Kubernetes orchestration. - Azure AI infrastructure and GPU environments. - CI/CD pipelines and DevOps automation. - Multi-agent AI architectures. - Knowledge graph implementations. - Business intelligence and analytics platforms. Success Metrics - Deploy and optimize production AI workloads. - Improve AI response quality and accuracy. - Reduce inference latency and infrastructure costs. - Expand Performacentric's AI agent platform capabilities. - Deliver reliable AI integrations for customer environments. - Contribute to the development of new AI-powered products and services. Benefits - Opportunity to work on cutting-edge AI and agentic technologies. - Direct influence on product architecture and technical strategy. - Remote-first work environment. - Competitive compensation based on experience. - Professional growth opportunities in one of the fastest-growing areas of software development. - Ability to help shape the future of AI-powered business transformation. How to Apply - Resume/CV - Brief cover letter - GitHub profile (if available) - Portfolio of AI, machine learning, or software development projects - Examples of LLM, FastAPI, or AI agent implementations (preferred)

United States
Qualus logo

Senior Project Engineer

Qualus

Qualus is a leading pure-play power solutions firm and innovator at the forefront of power infrastructure transformation, with differentiated capabilities across grid modernization, resiliency, security, and sustainability. The firm partners with utilities, commercial, industrial, data center, and government clients, and renewable and energy storage developers, offering comprehensive solutions through boutique and integrated advisory, planning, engineering, digital solutions, program management, and specialized field services. Qualus also provides software and technology enabled services and develops breakthrough solutions for critical power industry challenges such as distributed and variable resource integration, emergency management, and secure data exchange. The firm has over 1,800 professionals, with offices throughout the U.S. and Canada. EEO We are an equal opportunity employer and value diversity. We are committed to providing an inclusive workplace and do not discriminate on any grounds protected by applicable human rights legislation across Canada and the US.

Engineer12 days ago
Full TimeRemoteTeam 1,001-5,000H1B Sponsor

• Manage and coordinate all technical aspects of the project including planning, design, development, testing, and deployment. • Develop and execute system build plans accounting for software architecture, information security, and hardware deployment. • Work with the Project Manager and provide technical input into the overall project schedule, budget, resource requirements, and risk management. • Lead engineering workshops on topics such as system design, data conversion, business process reviews, test, and validation planning, and change management. • Installation and configuration of OSI monarch advanced distribution management applications • Design and configure SCADA/GMS/EMS/ADMS/OMS systems to meet the specific needs of our clients. • Deploy and integrate SCADA/GMS/EMS/ADMS/OMS systems with other equipment and software. • Provide training and support to clients during and after the implementation process. • Collaborate with project teams and other stakeholders to ensure successful implementation of the project. • Perform system and acceptance testing with the end customers. • Test and validate system configurations to ensure they meet client requirements and industry standards including NERC CIP. • Develop scripts and other solutions to integrate 3rd party systems. • Identify and troubleshoot any issues that arise during the implementation process. • Keep up to date with the latest developments in the field and ensure systems are implemented in accordance with industry best practices. • Travel to customer sites and work directly with customers to successfully deliver, test, and integrate systems • Contribute to proposals, work plans, budgets, and schedules for services opportunities.

Texas