cover image
Fabrion

Fabrion

www.fabrion.com

2 Jobs

3 Employees

About the Company

Listed Jobs

Company background Company brand
Company Name
Fabrion
Job Title
ML/AI Research Engineer — Agentic AI Lab (Founding Team)
Job Description
**Job Title:** ML/AI Research Engineer – Agentic AI Lab (Founding Team) **Role Summary:** Lead end‑to‑end development of agent‑native AI models for enterprise infrastructure. Design, fine‑tune, evaluate, and deploy large language models (LLMs), retrieval‑augmented generation (RAG) systems, knowledge graphs, and reinforcement learning agents that operate on structured and unstructured enterprise data. **Expectations:** - Full‑cycle ML ownership from data curation to production deployment. - Production‑grade performance with cost‑efficient inference. - Strong focus on alignment, explainability, safety, and continuous monitoring. - Startup mindset: resourceful, fast‑moving, comfortable with ambiguity, and collaborative with product/UX teams. **Key Responsibilities:** - Fine‑tune open‑source LLMs (LLaMA, Mistral, Falcon, Mixtral) using HuggingFace, DeepSpeed, vLLM, FSDP, LoRA/QLoRA. - Build and optimise RAG pipelines with LangChain, LangGraph, LlamaIndex, and vector DBs (Weaviate, Qdrant, FAISS). - Train and deploy agent architectures (ReAct, AutoGPT, BabyAGI, OpenAgents) on enterprise task data. - Design embedding‑based memory and retrieval chains with token‑efficient chunking. - Create RL pipelines (RLHF, DPO, PPO) to refine agent behaviours. - Develop scalable evaluation harnesses: synthetic tests, trace capture, explainability tools. - Implement model observability, drift detection, error classification, and alignment checks. - Optimize inference latency and GPU utilisation across cloud/on‑prem environments. **Required Skills:** - **LLM Training:** HuggingFace Transformers, DeepSpeed, vLLM, FSDP, LoRA/QLoRA; SFT, RLHF, DPO pipelines; dataset curation, filter creation, eval split design. - **RAG & Knowledge Graphs:** LangChain, LangGraph, LlamaIndex, Weaviate, Qdrant, FAISS; grounding with SQL, graph, metadata; optional Neo4j, RDF/OWL. - **Agent Intelligence:** ReAct, OpenAgents, BabyAGI; multi‑step reasoning, memory recall, tool utilisation, self‑correction, multi‑agent communication. - **Optimization:** Token cost management, chunking strategies, reranking (Cohere, Jina), compression, retrieval latency tuning; inference under quantisation (int4/int8); multi‑GPU throughput (vLLM, TGI). - **Tech Stack Proficiency:** Python (core), optional Rust (inference) or JavaScript (UX prototyping). - **Soft Skills:** Startup resilience, curiosity about agent architectures, end‑to‑end performance ownership, strong safety and explainability instincts, collaborative design mindset. **Required Education & Certifications:** - Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related technical field. - No mandatory certifications; proven practical experience in ML engineering and model deployment is essential.
San francisco, United states
On site
07-09-2025
Company background Company brand
Company Name
Fabrion
Job Title
ML Ops Engineer — Agentic AI Lab (Founding Team)
Job Description
**Job Title:** ML Ops Engineer **Role Summary:** Bridge ML research and production systems by building automation pipelines for model training, deployment, versioning, and observability. Focus on scalable infrastructure for LLMs, RAG, and agent-native pipelines with governance, security, and hybrid compute orchestration. **Expectations:** - 4+ years in MLOps, ML platform engineering, or infrastructure-focused ML roles. - Deep expertise in model lifecycle management tools (MLflow, DVC, HuggingFace). - Proven experience in LLM deployment (open-source models preferred), model tuning libraries, and infrastructure automation. - Proficiency in hybrid cloud/on-prem infrastructure (Kubernetes, Terraform), containerization (Docker), and CI/CD tools. **Key Responsibilities:** - Design pipelines for LLM fine-tuning (SFT, LoRA, RLHF), RAG embedding updates, model quantization, and inference deployment. - Orchestrate hybrid infrastructure (GPU clusters, cloud/on-prem) using Kubernetes, Ray, and Terraform. - Containerize models/agents with Docker and CI/CD (GitHub Actions, ArgoCD). - Implement model governance (versioning, lineage, reproducibility, evaluation frameworks). - Integrate security/access control (OPA, Keycloak) and monitor observability (latency, drift, error tracing). - Deploy agentic systems with LangChain, LangGraph, and inference backends (vLLM, TGI). **Required Skills:** - Model infrastructure: MLflow, DVC, HuggingFace, DeepSpeed, FSDP, QLoRA. - Automation/Infra: Kubernetes, Terraform, Helm, GitHub Actions, ArgoCD. - Inference: vLLM, Triton, Ray Serve, TGI. - Pipelines: Airflow, Prefect. - Monitoring: Prometheus, Grafana, LangSmith. - Security: OPA (Rego), Keycloak. - Languages: Python (required), Bash; Rust/Go optional. **Required Education & Certifications:** Not specified in the job description.
San francisco, United states
On site
Mid level
07-09-2025