Job Closed
This listing is no longer active.
On a mission to create financial freedom. No interest. No fees. Shariah-Compliant.
Senior ML/Data Ops Engineer II
Location
Serbia
Posted
136 days ago
Salary
0
Seniority
Senior
Job Description
Senior ML/Data Ops Engineer II
Tabby
• LLM Serving & Model Management: • Deep expertise in high-throughput serving using vLLM, NVIDIA TensorRT-LLM, and sglang to minimize latency and maximize hardware efficiency. • Hands-on experience deploying and optimizing large-scale open-weights models, specifically DeepSeek 3.1/3.2, Qwen, and GPT-OSS variants. • Advanced optimization and security hardening of Docker specifically for GPU environments. • Managing model weights and orchestration within Kubernetes (GKE) environments. • Real-Time Data Engineering & CDC: • Designing and maintaining high-throughput CDC (Change Data Capture) pipelines using the Apache ecosystem (e.g., Debezium, Kafka) to sync data from Cloud PostgreSQL. • Deploying and tuning ClickHouse for real-time analytics, ML feature storage, and high-speed logging. • Orchestrating complex ML data workflows using Airflow (Google Cloud Composer) to ensure data reliability. • Core Infrastructure & Networking: • Strong Linux systems expertise including internals, networking, and performance tuning for large-scale distributed systems. • Experience with Istio service mesh to manage microservices communication and traffic. • Provisioning and maintaining dedicated GPU nodes (A100/H100/H200/B200), including driver management and OS-level tuning using Ansible. • Solid Kubernetes expertise: controllers, CRDs, CNI, and Ingress. • CI/CD & Tooling: • Implementing pipelines as code within GitLab CI, managing runners, caching, and security scanning. • Infrastructure as Code with Terraform and Terragrunt. • Proficiency in Python/Bash for building custom automation and AI Agent tooling. • Load Testing & Observability: • Conducting rigorous load testing for GenAI applications, focusing on metrics like TTFT, TPS, and RPS. • Deploying and managing LiteLLM Gateway for unified API access, load balancing, and cost tracking. • Experience with Datadog for monitoring GPU utilization, inference health, and log pipelines. • Soft Skills: • Strong ownership mindset: balancing speed, reliability, and cost. • Comfortable working cross-functionally with developers, security, and compliance. • Excellent sense of responsibility and accountability. • English B2 or higher. • Nice to Have: • Experience with PCI-DSS, SOC2, or regulations compliance environments. • Our Tech Stack: Linux, Docker, Kubernetes, GCP (GKE, Cloud PostgreSQL), Datadog, GitLab, Apache CDC, ClickHouse, Airflow, Istio, Terraform, Terragrunt, Ansible, vLLM, TensorRT-LLM, sglang, LiteLLM, DeepSeek, Qwen, Go, Python
Job Requirements
- Deep expertise in high-throughput serving using vLLM, NVIDIA TensorRT-LLM, and sglang to minimize latency and maximize hardware efficiency.
- Hands-on experience deploying and optimizing large-scale open-weights models, specifically DeepSeek 3.1/3.2, Qwen, and GPT-OSS variants.
- Advanced optimization and security hardening of Docker specifically for GPU environments.
- Managing model weights and orchestration within Kubernetes (GKE) environments.
- Designing and maintaining high-throughput CDC (Change Data Capture) pipelines using the Apache ecosystem (e.g., Debezium, Kafka) to sync data from Cloud PostgreSQL.
- Deploying and tuning ClickHouse for real-time analytics, ML feature storage, and high-speed logging.
- Orchestrating complex ML data workflows using Airflow (Google Cloud Composer) to ensure data reliability.
- Strong Linux systems expertise including internals, networking, and performance tuning for large-scale distributed systems.
- Experience with Istio service mesh to manage microservices communication and traffic.
- Provisioning and maintaining dedicated GPU nodes (A100/H100/H200/B200), including driver management and OS-level tuning using Ansible.
- Solid Kubernetes expertise: controllers, CRDs, CNI, and Ingress.
- Implementing pipelines as code within GitLab CI, managing runners, caching, and security scanning.
- Infrastructure as Code with Terraform and Terragrunt.
- Proficiency in Python/Bash for building custom automation and AI Agent tooling.
- Conducting rigorous load testing for GenAI applications, focusing on metrics like TTFT, TPS, and RPS.
- Deploying and managing LiteLLM Gateway for unified API access, load balancing, and cost tracking.
- Experience with Datadog for monitoring GPU utilization, inference health, and log pipelines.
- Strong ownership mindset: balancing speed, reliability, and cost.
- Comfortable working cross-functionally with developers, security, and compliance.
- Excellent sense of responsibility and accountability.
- English B2 or higher.
- Experience with PCI-DSS, SOC2, or regulations compliance environments.
Benefits
- Full-time B2B contract
- Fully remote setup, work from anywhere in Europe
- Up to 20% tax allowance
- 22 paid leave days annually
- Stock options (ESOP) in a fast-scaling, pre-IPO company
- Flexi benefits you can use for wellness, travel, or learning
- Work alongside a high-performing, international engineering team in a global fintech unicorn
- Relocation support is available to our hubs in Armenia, Georgia, Serbia, and Spain, including flights, temporary accommodation, and legal setup.
Related Guides
Related Categories
Related Job Pages
More Operations Jobs
FinOps Markets Associate
NearsureRemove the barriers to growth by scaling your team fast with top-notch Latin American IT talent
• Execute the processes to reconcile the full transactional flows of the company, ensuring their proper recording and compliance with audit processes. • Execute the processes to review the automated reconciliation results, analyzing discrepancies and identifying and reporting potential issues related to them. • Ensure all the processes are executed according to the committed SLAs. • Identify, report, analyze, and resolve discrepancies detected during the reconciliation process, collaborating with cross-functional teams to implement corrective actions.
Operations Administrator
NeoWorkWe help small businesses scale rapidly by leveraging outsourced talent for operations and animation.
• Manage user access and troubleshoot issues across platforms like Microsoft 365, SharePoint, marketplace portals, and internal software. • Accurately handle data entry and generate, maintain, and distribute operational reports on tight deadlines. • Collaborate on creating and improving SOPs, automate workflows, and proactively identify opportunities to enhance operational efficiency. • Support cross-team communication and ensure smooth workflow integration.
• Supervise staff of 5-10 personnel who are responsible for recruitment, onboarding, training, scheduling, and performance management. • Monitor staffing levels to meet mission and contract requirements • Oversee timely and accurate deployment of medical personnel • Conduct forecasting and personnel planning to meet operational needs • Streamline workflows to improve service efficiency and consistency • Conduct regular audits and performance reviews • Resolve discrepancies, and enforce adherence to contractual and regulatory standards • Prepare briefings and deliver reports, updates, and recommendations to leadership • Leverage data and software tools to identify trends and proactively address challenges • Develop, update, and maintain Standard Operating Procedures (SOPs) • Performs other job-related duties as assigned
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We're hiring an IT Operations Manager to lead our internal technology operations. You'll oversee SaaS management, identity and access systems, endpoint security, and the full employee technology experience—from onboarding to offboarding and everything in between. You'll build processes that scale with us, manage vendors that power our daily work, and ensure that our IT operations meet both the high bar our employees expect and the regulatory standards our industry demands. In this role, you will: - Manage our SaaS application portfolio including vendor relationships, license optimization, contract negotiations, and lifecycle management across our entire technology stack. - Oversee identity and access management through Okta, implementing authentication controls, managing user provisioning and deprovisioning, and ensuring proper access governance aligned with least privilege principles. - Lead endpoint security and device management using Kolide and related tools, maintaining visibility across our device fleet, enforcing security configurations, and ensuring compliance with our security standards. - Design and execute seamless IT onboarding and offboarding experiences that give new hires secure access to necessary tools from day one while ensuring departing employees are properly offboarded. - Manage internal tech support operations, establishing service levels, triaging and resolving employee technology issues, and maintaining high satisfaction with IT support. - Coordinate IT assessment management including internal audits, vendor security reviews, and remediation tracking to ensure IT operations meet regulatory and compliance requirements. - Collaborate with Infrastructure, Information Security, and cross-functional teams to align employee technology needs with our AWS-based infrastructure and security requirements. - Manage IT operations budget and make data-driven decisions about technology investments that balance employee productivity, cost efficiency, and security. - Build and mentor an IT operations team as we scale, fostering a culture of service excellence and continuous improvement. Qualifications - 5-7+ years of experience in IT operations with at least 2-3 years in a management or team lead role. - Deep expertise in SaaS application management, including vendor negotiation, license optimization, and lifecycle management. - Strong experience with identity and access management platforms, particularly Okta, including user provisioning, SSO configuration, and access governance. - Hands-on experience with modern endpoint management solutions like Kolide, Jamf, Intune, or similar tools, with a strong understanding of device security and zero-trust principles. - Track record managing IT operations in fast-paced technology companies, ideally in regulated industries like financial services where security and compliance are paramount. - Experience coordinating IT audits and assessments, tracking remediation efforts, and working within compliance frameworks (SOC 2, ISO 27001, or similar). - Excellent communication skills with the ability to provide executive-level support while efficiently resolving technical issues for employees at all levels. - Strong vendor management capabilities and experience managing multiple SaaS vendor relationships simultaneously. - Problem-solving approach that balances usability, security, and cost while maintaining a customer-service mindset. - Comfort working autonomously, making decisions with incomplete information, and adapting to changing priorities in a dynamic environment. Benefits The total rewards package at Mercury includes base salary, equity (stock options), and benefits. Our salary and equity ranges are highly competitive within the SaaS and fintech industry and are updated regularly using the most reliable compensation survey data for our industry. New hire offers are made based on a candidate’s experience, expertise, geographic location, and internal pay equity relative to peers. Our target new hire base salary ranges for this role are the following: - US employees in New York City, Los Angeles, Seattle, or the San Francisco Bay Area: $154,200 - $192,800. - US employees outside of New York City, Los Angeles, Seattle, or the San Francisco Bay Area: $138,800 - $173,500. - Canadian employees (any location): CAD $145,800 - $182,200. Company Description Mercury is a fintech company, not an FDIC-insured bank. Banking services provided through Choice Financial Group and Column N.A., Members FDIC. Mercury values diversity & belonging and is proud to be an Equal Employment Opportunity employer. All individuals seeking employment at Mercury are considered without regard to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, sexual orientation, or any other legally protected characteristic. We are committed to providing reasonable accommodations throughout the recruitment process for applicants with disabilities or special needs. If you need assistance, or an accommodation, please let your recruiter know once you are contacted about a role. We use Covey as part of our hiring and/or promotional process for jobs in NYC and certain features may qualify it as an AEDT. As part of the evaluation process, we provide Covey with job requirements and candidate submitted applications.




