Skills

Communication Leadership Python CI/CD Docker Kubernetes Monitoring Research Architecture Machine Learning PyTorch TensorFlow AWS SDLC GCP Redis FastAPI Spark OpenAI Kafka Terraform Prometheus Grafana gRPC NLP

Job Specifications

Director of Machine Learning (Healthcare AI)

Location: New York City -- On-site (hybrid considered)

Employment: Full-time | Team: Founding/Engineering Leadership

Compensation: Competitive base + meaningful equity

Why this role

We're building the machine intelligence that powers a safe, scalable AI doctor. As Director of Machine Learning, you'll own the ML roadmap end-to-end--shipping production systems that deliver diagnostic reasoning, chronic disease management, and HIPAA-grade data processing at scale. You'll set the technical bar, hire and mentor the team, and harden the safety rails that make medical AI trustworthy.

What you'll lead (zero - one - scale)

ML Strategy & Org Building

Define the ML/AI vision, architecture standards, and research-to-production pipeline.
Build and lead a high-performing team of ML engineers, applied scientists, and MLOps.
Establish the SDLC for models: evaluation, safety, monitoring, rollback, and post-incident learning.

Clinical Reasoning & LLM Systems

Own multi-agent reasoning (debate/consensus) and tool-use policies for clinical tasks.
Scale retrieval-augmented generation (RAG) from thousands of guidelines with provenance and audit trails.
Drive prompt/program synthesis, fine-tuning, and distillation for low-latency inference.

Data & Evaluation

Stand up HIPAA-compliant data pipelines: de-identification, labeling, weak supervision, and active learning.
Define gold-standard evals with clinicians: accuracy, safety, fairness, and explanation quality.
Build offline/online experiment frameworks (A/B, counterfactual, shadow deploys).

Safety & Compliance

Implement guardrails: contraindication checks, uncertainty calibration, human-in-the-loop escalation.
Bias and drift detection across demographics and care pathways; model cards and documentation.
Partner with compliance on FDA/IEC considerations and real-world performance tracking.

Platform & Scale

Collaborate with Platform/Backend to deliver sub-second inference at peak load.
Architect multi-region, fault-tolerant model serving with deterministic backstops.
Align with product on domain workflows, clinician tooling, and cost/performance tradeoffs.

Responsibilities

Translate ambiguous clinical problems into prioritized ML roadmaps with clear success metrics.
Ship production models (LLMs + classical ML) for triage, summarization, document generation, and decision support.
Lead design reviews; author high-signal design docs and experiment plans.
Partner with Backend to expose safe, well-versioned inference APIs and feature stores.
Create an evidence pipeline: offline eval - shadow - gated release - continuous monitoring.
Recruit, coach, and level up the team; establish hiring rubrics and technical standards.
Communicate progress and risk to execs, clinicians, and cross-functional partners.

Qualifications (must-have)

8-12+ years in ML/AI with 3-5+ years leading teams; startup or zero-to-one experience.
Deep hands-on experience with LLMs/gen-AI (OpenAI, Anthropic, LLaMA, etc.), RAG, fine-tuning, and optimization.
Strong applied track record shipping real-time production ML systems at scale.
Proficiency in Python and PyTorch/TensorFlow; solid software engineering fundamentals.
Comfortable with MLOps: feature stores, model registries, CI/CD for models, observability, canary/rollback.
Demonstrated work in safety-critical contexts: uncertainty, bias/fairness, post-deployment monitoring.
Excellent written/spoken communication; proven ability to partner with product, clinical, and platform teams.

Nice to have

Healthcare/med-tech experience; familiarity with HL7, FHIR, EHR integrations, and clinical workflows.
Publications/patents in NLP, clinical ML, safety, or multi-agent systems.
Experience with FDA SaMD guidance, IEC 62304/82304, or similar frameworks.
Background in multilingual NLP, speech (ASR/TTS), or predictive risk modeling.
Prior seed/Series A leadership; hiring from scratch and scaling teams.

What success looks like

90 days:

ML strategy, safety gates, and evaluation plan in place. First high-impact model shipped behind a feature flag with shadow testing and clinician review.

180 days:

Multi-agent reasoning + RAG stack serving production traffic with measurable lifts in accuracy and time-to-answer.
End-to-end monitoring (quality, bias, cost) live; on-call and incident playbooks operational.

12 months:

ML org of 6-12 high performers; roadmap delivering quarterly value.
Documented safety and performance portfolio suitable for payer/provider and regulatory conversations.

Our stack (indicative)

Modeling: PyTorch, Transformers, vLLM/ONNX/TensorRT, LoRA/QLoRA, vector DBs (FAISS/pgvector)
Data/MLOps: Python, Airflow/Prefect, Spark/Ray, Feast, MLflow/W&B, Kubernetes, Argo, Docker
Serving: FastAPI/gRPC, Kafka/Redpanda, Redis, OpenTelemetry, Prometheus/Grafana
Cloud: AWS/GCP, Terraform, multi-region deployments
Security/Compliance: PHI de-identification, KMS/HSM, policy engines, audit logging

What we offer

Fo

About the Company

Nxt Level redefines recruitment, transforming it into a strategic, client-focused partnership. We're not just recruiters; we're dedicated allies in your talent acquisition, committed to delivering results through a blend of speed, precision, and an understanding of your unique needs. Our Services: * Contract * Contract-to-Hire * Direct Hire * Executive Search Key Highlights: * High Acceptance Rate: An impressive 89% rate, thanks to our targeted strategies. * Efficient Hiring: Averaging 4 interviews per offer, saving time and... Know more