Monterail logo
Monterail

Delivering Innovative Software

LLM Engineer, Freelancer

LLM EngineerMachine Learning EngineerPart TimeRemoteSeniorTeam 51-200Since 2011H1B No SponsorCompany SiteLinkedIn

Location

Poland

Posted

10 days ago

Salary

0

Seniority

Senior

Job Description

LLM Engineer, Freelancer

Monterail

• Add AI functionality into existing Python, Node.js, and Ruby codebases • Build LLM-powered features: chat, summaries, classification, smart search, document Q&A • Design lightweight RAG pipelines using embeddings and vector search • Work with vector DBs (pgvector, Pinecone, Qdrant) • Implement safe, reliable LLM endpoints (OpenAI, Anthropic, Azure) • Work directly with clients to shape AI features and reduce manual effort • Advise clients when NOT to use AI and navigate trade-offs around latency, accuracy, and cost

Job Requirements

  • Strong software engineering background - Python, Node.js, or Ruby
  • Hands-on experience integrating LLM APIs into production systems (OpenAI, Anthropic, or similar)
  • Ability to design pragmatic AI solutions within existing architectures
  • Experience building RAG pipelines, embeddings, and vector search
  • Understanding of cost, latency, and reliability constraints of AI systems
  • Ability to work independently with clients and set realistic expectations
  • Background in big data or data pipelines is a big plus
  • English B2/C1
  • Availability part-time or full-time (B2B contract)

Related Job Pages

More LLM Engineer Jobs

Full TimeRemoteTeam 10,001+Since 1890H1B No Sponsor

• Design, build, and maintain scalable ML infrastructure and pipelines supporting model training, deployment, monitoring, governance, and lifecycle management. • Develop and optimize CI/CD pipelines for machine learning and AI workloads across development, staging, and production environments. • Build reusable ML platform capabilities including feature stores, model registries, experimentation frameworks, artifact management, and deployment automation. • Implement scalable orchestration and workflow solutions for batch and real-time ML inference workloads. • Create robust monitoring systems to measure model performance, detect model drift, monitor data quality, and ensure production reliability. • Develop automation tools and self-service capabilities to improve the efficiency, scalability, and reliability of MLOps processes. • Collaborate with Data Scientists and Software Engineers to streamline the ML lifecycle from experimentation through enterprise production deployment. • Apply software engineering best practices to AI/ML systems including testing, observability, resiliency, security, versioning, and infrastructure-as-code. • Identify gaps and improvement opportunities within the organization’s ML platform ecosystem and architect scalable solutions to address them. • Support enterprise AI governance, compliance, auditability, and model risk management requirements. • Ensure platform scalability, reliability, security, and operational excellence across AI/ML systems. • Lead the architecture, design, and deployment of enterprise Generative AI solutions leveraging LLMs, foundation models, and agentic AI systems. • Design and implement Retrieval-Augmented Generation (RAG) pipelines using vector databases, embeddings, semantic search, reranking, and retrieval optimization strategies. • Build scalable LLM orchestration frameworks using technologies such as LangChain, LlamaIndex, Semantic Kernel, or equivalent frameworks. • Develop advanced prompt engineering strategies, prompt chaining, context management, and agent workflows to improve LLM accuracy and reliability. • Evaluate and implement fine-tuning, parameter-efficient tuning, and prompt-based optimization approaches for domain-specific use cases. • Build AI evaluation and benchmarking frameworks to measure hallucination rates, response quality, grounding accuracy, toxicity, bias, latency, and business performance metrics. • Implement AI safety guardrails, governance controls, content filtering, and responsible AI practices for enterprise healthcare environments. • Design scalable GenAI APIs and microservices supporting high-throughput enterprise AI applications. • Optimize GenAI systems for cost, latency, throughput, and inference performance across cloud and hybrid environments. • Integrate enterprise data sources, healthcare systems, and knowledge repositories into secure GenAI workflows. • Research and evaluate emerging GenAI technologies, open-source frameworks, and foundation models to drive innovation and continuous improvement. • Develop architecture diagrams, technical roadmaps, implementation strategies, and executive-level documentation for enterprise AI initiatives. • Collaborate with cybersecurity, compliance, and infrastructure teams to ensure secure and compliant deployment of GenAI solutions involving PHI and sensitive healthcare data. • Contribute to the development of AI platform standards, reusable GenAI accelerators, templates, and engineering best practices.

Alabama + 24 moreAll locations: Alabama | Florida | Idaho | Kansas | Louisiana | Maine | Nebraska | Nevada | New Hampshire | North Dakota | Ohio | Oklahoma | Maryland | Minnesota | Pennsylvania | South Carolina | South Dakota | Tennessee | Texas | Utah | Virginia | Washington | West Virginia | Wisconsin | Wyoming
$91.4K - $152.4K / year
Job Closed
InternshipRemoteTeam 10,001+Since 1991H1B Sponsor

• Design, develop and maintain Generative AI applications, advanced AI models and conversational agents. • Use Gen AI techniques like Retrieval Augmented Generation (RAG), Prompt Engineering, Embeddings etc. to enhance model outcomes. • Perform evaluation on LLM output using appropriate metrics and iteratively improve results. • Stay updated with the latest industry trends and advancements in AI to implement cutting-edge solutions. • Communicate model results and solicit feedback from stakeholders at various levels.

California
$26 / hour
Job Closed

Role Description Throughput. Latency. KV cache utilization. Move those three numbers in the right direction, and two things happen: customers get faster, cheaper inference, and our margins improve. That's the entire thesis of this role. Every kernel you tune, every quantization scheme you ship, every scheduler tweak you land shows up directly in a customer's p99 and on our P&L. This is a high-impact seat. It is also a high-autonomy seat as you'll be given the room to lead the technical direction of inference optimization at Kimchi, not execute someone else's roadmap. The problem: running LLMs in production is a moving target. The "right" model and serving configuration for a workload depend on: - Traffic shape - Sequence-length distribution - Batch dynamics - GPU SKU - Memory bandwidth - Quantization tolerance - A dozen other variables that shift week to week Most teams pick a model once, over-provision GPUs, and absorb the cost. Kimchi is the system that makes that decision automatically - continuously matching workloads to the most cost-efficient, best-performing LLM and serving configuration on a customer's infrastructure. We're building the optimization layer between the model and the hardware, and we need engineers who understand both sides deeply. Qualifications - 5+ years building real ML systems, with a portfolio that shows depth in inference or training infrastructure (not just model training notebooks). - Strong Python - production services, not scripts. - Hands-on experience with at least one of vLLM, SGLang, or TensorRT-LLM, and a working mental model of why an inference engine performs the way it does on a given GPU. - Fluency with quantization tradeoffs - you've measured quality regressions, not just compression ratios. - Comfort with distributed systems: collective communication, sharding strategies, and the practical failure modes of multi-GPU and multi-node setups. - A bias toward measurement. You instrument before you optimize, and you can tell the difference between a real win and a benchmark artifact. - Self-direction. This role comes with a wide mandate; you should be excited by that, not unsettled by it. Requirements - Push throughput. Continuous batching, speculative decoding, chunked prefill, kernel-level tuning across vLLM, SGLang, and TensorRT-LLM. Find the ceiling on each GPU SKU, then raise it. - Cut latency. Attack TTFT and TPOT separately. Profile, identify the actual bottleneck (compute, memory bandwidth, scheduling, networking), and fix it - not the bottleneck you assumed. - Get more out of the KV cache. Paged attention, prefix caching, eviction policies, cache reuse across requests, quantized KV. This is where a lot of the unrealized throughput lives, and it's an area you'll own. - Quantize without regressing quality. INT8, INT4, FP8 across weights, activations, and KV. Empirical work: measure quality on real workloads, not just perplexity benchmarks. - Shrink cold starts and memory footprint. Faster init, smarter weight loading, tighter memory accounting - the difference between a model that scales and one that doesn't. - Scale across nodes. Distributed inference topologies, network-aware placement, checkpointing strategies that don't bottleneck on storage or interconnect. - Set the technical direction. Decide what we benchmark, what we adopt, and what we build ourselves. Bring the team along with strong writeups and reproducible experiments. Benefits - Competitive salary (depending on the level of experience). - Enjoy a flexible, remote-first global environment. - Collaborate with a global team of cloud experts and innovators, passionate about pushing the boundaries of Kubernetes technology. - Equity options. - Get quick feedback with a fast-paced workflow. Most feature projects are completed in 1 to 4 weeks. - Spend 10% of your work time on personal projects or self-improvement. - Learning budget for professional and personal development - including access to international conferences and courses that elevate your skills. - Annual hackathon to spark new ideas and strengthen team bonds. - Team-building budget and company events to connect with your colleagues. - Equipment budget to ensure you have everything you need. - Extra days off to help maintain a healthy work-life balance. Hiring process - Screening call with Recruiter - Hiring Manager interview - Technical interview (system design) - Live coding - Culture Check interview with an executive As part of our standard hiring process, we would like to inform you that a background check may be conducted at the final stage of recruitment through our third-party provider, Checkr. Please note that Cast AI does not provide any form of visa sponsorship/work permit.

Worldwide
Job Closed
Zensar logo

AS-ISG-TE-HLS GenAI / LLM

Zensar

At Zensar, we’re “experience-led everything”. We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose: Together, we shape experiences for better futures. Whether for our clients, our people, or the world around us, this belief powers everything we do. At the heart of our culture is ONE with Client - a set of four core values that reflect who we are and how we work: One Zensar, Nurturing, Empowering, and Client Focus. Part of the $4.8 billion RPG Group, we’re a community of 10,000+ innovators across 30+ global locations, including Milpitas, Seattle, Princeton, Cape Town, London, Zurich, Singapore, and Mexico City. We believe the best work happens when individuality is celebrated, growth is encouraged, and well-being is prioritized. We are an equal employment opportunity (EEO) and affirmative action employer, committed to creating an inclusive workplace. All qualified applicants will be considered without regard to race, creed, color, ancestry, religion, sex, national origin, citizenship, age, sexual orientation, gender identity, disability, marital status, family medical leave status, or protected veteran status.

LLM Engineer17 days ago
Full TimeRemoteTeam 10,001

Role Description At Zensar, we are at the forefront of AI and web development, and we are looking for a talented individual to join our team and contribute to our innovative projects. The AS-ISG-TE-HLS GenAI/LLM role is a key position, where you will have the opportunity to work with a diverse range of technologies and make a significant impact. Your expertise will be utilized to develop and enhance our AI-powered solutions, utilizing Azure OpenAI and LangChain frameworks. - Knowledge of Frontend technologies like REACT or NEXTJS is valued. Qualifications - Experience with AI and web development. - Familiarity with Azure OpenAI and LangChain frameworks. - Knowledge of Frontend technologies like REACT or NEXTJS is a plus. Requirements - Ability to work with a diverse range of technologies. - Strong problem-solving skills. - Ability to contribute to innovative projects. Benefits - Inclusive workplace culture. - Opportunities for growth and development. - Commitment to well-being and individual celebration.

India
Job Closed