Location: Remote (U.S.-based - cannot provide sponsorship for this role)
Type: Full-time | Direct Hire
Compensation: $160K–$300k+ (flexible for the right candidate)
Our client is a forward-thinking AI company building next-gen products powered by Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG). They are actively looking for a Machine Learning Engineer with hands-on experience building LLM pipelines and implementing RAG architectures in production.
You’ll lead the development of scalable, intelligent systems that combine unstructured data with cutting-edge AI to deliver real-time insights and automation.
Architect, build, and deploy end-to-end systems using LLMs and RAG
Design intelligent document or data retrieval workflows using vector databases and embedding models
Own implementation of LangChain, LlamaIndex, or similar tools to orchestrate RAG flows
Integrate external APIs (OpenAI, Claude, Mistral, etc.) and optimize model selection, prompting, and performance
Collaborate with backend and data engineers to ship reliable, scalable ML features
Continuously improve retrieval precision and model relevance with feedback loops
Python, PyTorch, Hugging Face, LangChain, LlamaIndex
Vector DBs: Pinecone, Weaviate, FAISS, Qdrant
OpenAI, Anthropic, LLaMA, Mistral APIs
AWS / GCP / Azure ML environments
3+ years of ML Engineering experience (can include backend-heavy ML roles)
Proven experience working with LLMs and building RAG pipelines
Deep understanding of embeddings, semantic search, and vector databases
Ability to design and deploy production-level ML systems
Strong software engineering fundamentals
Candidates must be U.S. citizens.
Sponsorship is not available for this role; candidates must be authorized to work in the U.S. on a permanent, full-time basis without the need for future sponsorship