At DFIN, we are a values-driven organization that empowers you to build a fulfilling career while bringing your authentic self to work every day. Our “Win as One” mentality ensures that our team’s success is directly linked to Client, Shareholder and Employee Satisfaction. Recognized as one of AMERICA'S MOST LOVED WORKPLACES® for five consecutive years and a Built In Best Places to Work for six years, we are committed to our employees’ total well-being. Bring your passion and talents to DFIN – because being YOU thrives here.
Datacenter Service Owner
Location
United Kingdom
Posted
18 days ago
Salary
0
Seniority
Mid Level
No structured requirement data.
Job Description
Datacenter Service Owner
Donnelley Financial Solutions
Role Description At DFIN, employees are encouraged to grow their skills and advance their careers while contributing their expertise. The Datacenter Service Owner plays a key role in maintaining and supporting the company’s datacenter, ensuring all infrastructure (servers, networking, power, and cooling systems) operates efficiently and reliably. This role involves hands-on technical support, troubleshooting issues, and maintaining an optimal environment for systems, while also identifying opportunities for improvement and driving enhancements across the datacenter. The position requires strong problem-solving, communication, and teamwork skills, as well as the ability to work under pressure and take ownership of issues from start to resolution. Responsibilities - Monitor and maintain the health of datacenter systems (servers, power, cooling, environment) - Troubleshoot and resolve server, network, and infrastructure issues - Install, configure, upgrade, and decommission datacenter hardware - Manage and track all datacenter assets and inventory - Perform maintenance, diagnostics, and hardware replacements - Collaborate with IT teams, engineers, and vendors to support operations - Ensure compliance through audits, documentation, and change management processes - Support improvements, planning, and emergency/after-hours datacenter needs Qualifications - 3+ years of datacenter or IT infrastructure experience - Bachelor’s degree in IT, Computer Science, or related field - Strong knowledge of servers, datacenter environments, and networking equipment - Experience installing, maintaining, and troubleshooting hardware - Strong problem-solving, technical support, and documentation skills - Good communication and teamwork abilities - Ability to stay current with evolving technologies - Ability to perform physical tasks: - Lift up to ~50 lbs occasionally - Stand frequently - Bend/reach as needed Preferred - Certifications (CompTIA, CCNA/CCNP, etc.) - Experience with Dell hardware - Understanding of LAN/WAN and networking protocols Company Description Join a dynamic team at the pulse of global markets, where we deliver innovative software and service solutions for essential financial reporting and capital markets transactions. At DFIN, we are a values-driven organization that empowers you to build a fulfilling career while bringing your authentic self to work every day. Our “Win as One” mentality ensures that our team’s success is directly linked to Client, Shareholder and Employee Satisfaction. Recognized as one of AMERICA'S MOST LOVED WORKPLACES® for five consecutive years and a Built In Best Places to Work for six years, we are committed to our employees’ total well-being. Enjoy competitive compensation, a flexible workplace, comprehensive benefits, and opportunities for professional growth. Bring your passion and talents to DFIN – because being YOU thrives here.
Related Guides
Related Job Pages
More LLM Engineer Jobs
• Design end-to-end AI solutions on Dataiku's platform, leveraging Dataiku Agent Hub, Prompt Studio, LLM Mesh, and Knowledge Banks (Vector Stores), or Python-based frameworks where needed. • Build and orchestrate multi-agent systems using Dataiku's Visual Agents (simple and structured), as well as code-based frameworks (LangGraph, CrewAI, Claude Agent SDK, OpenAI Agents SDK) as appropriate. • Integrate and optimize LLM APIs across providers (OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure, open-source models via Dataiku's LLM Mesh), applying model routing strategies to balance cost, latency, and quality. • Implement Retrieval-Augmented Generation (RAG) pipelines, including agentic RAG and GraphRAG, using Dataiku's Knowledge Banks with reranking, dynamic filtering, and document extraction capabilities. • Work exclusively with the Marketing organisation, partnering across functions such as Demand Generation, Content Marketing, Product Marketing, Field Marketing, Marketing Operations, Brand, and Communications. • Engage marketing stakeholders to gather business requirements, then go further: identify the underlying user or team pain points those requirements represent, and design solutions that address both the stated need and the deeper problem. • Own projects end-to-end, from requirements intake and solution design through to build, deployment, and handover. • Develop autonomous and semi-autonomous AI agents, using Dataiku's Agent Builder, custom Python-based architectures (LangGraph, CrewAI, Claude Agent SDK, etc.), or a combination of both. Exercise judgment on when to leverage platform capabilities and when to build custom solutions. • Design and build Agent Tools beyond documented examples, including custom API integrations, data retrieval modules, decisioning logic, and automated workflows, pushing past out-of-the-box patterns to deliver solutions tailored to specific business problems. • Build, publish, and consume MCP (Model Context Protocol) servers to enable agent-to-tool integration across systems, including designing custom MCP servers where needed. • Develop evaluation and monitoring approaches for agent systems, combining Dataiku's built-in capabilities with custom instrumentation to measure reliability, accuracy, cost, and business impact in production. • Design and maintain evaluation frameworks (evals) for LLM-based systems, measuring accuracy, latency, cost, and reliability in production. • Adhere to data governance, security, and regulatory compliance requirements (EU AI Act awareness, responsible AI practices) for all AI solutions. • Leverage Dataiku's Cost Guard and Quality Guard features to manage LLM spend, enforce usage policies, and maintain output quality standards. • Work closely with analytics and data engineering teams to maintain metadata on reference datasets for LLM consumption. • Create front-end user interfaces for AI applications using HTML, CSS, and JavaScript, within Dataiku's webapps framework, Dataiku Answers for chat-based interfaces, or standalone applications built with Vue.js and Node.js. • Collaborate on UX design, ensuring internal stakeholders find AI solutions intuitive and responsive. • Provide product feedback to the development team to improve the platform. • Stay current with the rapidly evolving AI engineering landscape, agent frameworks, model capabilities, evaluation practices, governance requirements, and tools like MCP and A2A protocols.
• Lead bring-up, validation, and debugging of large-scale AI clusters, infrastructure, and end-to-end workloads, setting the standard for how the team operates. • Bring up, tune, and benchmark AI pre-training, post-training, and inference workloads using PyTorch, NeMo / Megatron, TensorRT-LLM, and adjacent NVIDIA AI software stacks. • Profile and optimize end-to-end workload performance across compute, memory, networking, and communication layers using tools such as Nsight Systems, NCCL tests, and custom microbenchmarks. • Analyze scaling efficiency for distributed LLM workloads using data, tensor, pipeline, and expert parallelism across modern GPU clusters, and translate findings into concrete tuning guidance. • Own root-cause analysis of complex failures — hangs, performance regressions, topology sensitivity in large distributed environments. • Define and build the resilience and failure-attribution stack: detecting, triaging, and attributing node, fabric, and workload failures across the cluster at scale. • Build repeatable benchmark suites, automation, acceptance criteria, and qualification workflows on new platforms. • Tune runtime settings, communication parameters, and deployment configurations in close partnership with framework, systems, and platform teams. • Deliver actionable, data-driven recommendations based on profiling, benchmark results, and cluster characterization. • Mentor engineers, drive technical standards, and act as a force multiplier across the broader performance and infrastructure organization.
Generative AI Engineer – LATAM Candidates Only
Talentus GlobalWe facilitate talent & software solutions across the globe. Near-shore, managed services, ERP's, CRM's, EdTech/HigherEd.
• Design, develop, and deploy Generative AI solutions leveraging Large Language Models (LLMs) and multimodal AI technologies. • Build and maintain scalable AI applications using cloud platforms such as Azure, AWS, or GCP. • Develop and optimize Retrieval-Augmented Generation (RAG) architectures, vector databases, and knowledge retrieval systems. • Fine-tune, evaluate, and monitor foundation models to improve performance, accuracy, and reliability. • Implement prompt engineering strategies and AI orchestration frameworks to support business use cases. • Collaborate with software engineering, data science, DevOps, and security teams to integrate AI solutions into production environments. • Develop APIs, microservices, and AI-powered applications following software engineering best practices. • Ensure compliance with AI governance, security, privacy, and responsible AI standards. • Monitor AI workloads, model performance, and operational costs, recommending continuous improvements. • Stay current with emerging Generative AI technologies, frameworks, and industry trends.
• Design and deploy AI platforms that integrate with infrastructure tools • Develop AI-powered workflows to automate operational tasks • Build AI-driven automation for incident response and operational workflows • Implement AI-powered monitoring and anomaly detection capabilities • Create intelligent operational dashboards with actionable insights • Ensure AI platforms operate reliably in production environments • Develop AI solutions for cost optimization and predictive capacity planning



