Skills

Python CI/CD Docker Kubernetes Monitoring Version Control Test Regression Testing Training Machine Learning PyTorch Scikit-Learn TensorFlow Regression Azure AWS Software Development Numpy Pandas GCP Data Science Langchain Large Language Models Prometheus NLP

Job Specifications

Position Name: Machine Learning Engineer - Model Verification & Testing

Location: Remote

Employment Type: Full-time

Notes: Direct Hire. No third-party assistance is approved for this role.

About The Job

At Jaxon, we're focused on making AI trustworthy for mission-critical environments. Our core technology is built to enable the safe deployment of AI in high-stakes settings, particularly across the Department of Defense. While our company supports both commercial and government applications, this role is dedicated to the defense side. As a Machine Learning Engineer at Jaxon, you'll be embedded on a sensitive project designed to help solidify the U.S. government's ability to use AI reliably. The work centers on advancing machine learning capabilities to meet the rigor and reliability required for national security operations.

A primary responsibility for this role is designing and running rigorous verification, validation, and test pipelines for ML/LLM systems -- including unit/integration tests for data pipelines and models, automated evaluation suites, adversarial and regression testing, and acceptance criteria used to certify models for high-assurance/defense deployments.

The Basics Of This Role Requires The Ability To

Collaborate Across Teams: Work with cross-functional, geographically distributed teams to integrate ML models into our existing systems and workflows, enhancing product capabilities.
Optimize Data Processing and Model Performance: Conduct comprehensive data management including preprocessing, feature engineering, and model evaluation to improve accuracy and efficiency.
Technical Proficiency with an NLP focus: Demonstrate solid machine learning engineering experience, particularly with NLP applications and unstructured data in a cloud environment. Fluency in Python is essential.
Understand Large Language Models: Familiar with the model deployment process, optimization of LLM parameters for specific behaviors, and a general understanding of LLM functionality and use cases. Previous experience with AI/LLM governance and guardrail development is a big plus.
Model Evaluation: Understanding of metrics and measurements of models to assess performance and apply rigorous evaluation practices to ensure reliability and effectiveness.

Experience With a Majority Of The Following Desired

Languages & Frameworks: Python (core and advanced), enterprise software development practices (modular design, testing frameworks such as pytest/unittest, CI/CD integration).
Python Data Science Stack: NumPy, Pandas, Scikit-Learn, PyTorch, TensorFlow, Hugging Face Transformers; strong experience in data manipulation, feature engineering, and ML pipeline construction.
LLM/Agentic Frameworks: LangChain, Langraph, LlamaIndex, and related libraries for building AI-assisted applications.
Systems & Infrastructure: Docker, Docker Compose, Kubernetes, container orchestration, cloud deployment (AWS, GCP, or Azure), monitoring/logging frameworks (Prometheus, ELK, etc.).
MLOps Practices: Building reproducible ML pipelines, model packaging/deployment, version control for ML artifacts (e.g., MLflow, DVC), integration with CI/CD workflows.
Modeling Expertise: Hands-on experience running local LLMs using Ollama and Hugging Face Transformers; building, training, and fine-tuning models for enterprise use cases.
NLP Competency: Strong foundation in both fundamentals (tokenization, embeddings, sequence modeling) and applied approaches (question answering, summarization, RAG, fine-tuning).
Software Delivery: Delivery of ML or software systems in Defense or other highly regulated domains, including direct interaction with government customers (e.g., service labs, joint commands).
Testing & V&V tooling: Practical experience building test harnesses and CI for ML (automated model evaluation suites, reproducible test datasets, A/B/canary testing, fuzzing/adversarial test generation, and metric-based acceptance gates for deployment).

About Jaxon: Jaxon is on a mission to make AI trustworthy. Developed through R&D with the U.S. Department of Defense, Jaxon converts complex rule sets--whether from policy manuals, onboarding playbooks, compliance protocols, or SME conversations--into executable logic. It extracts the key 'questions' hidden in unstructured docs, builds rule-driven evaluators, and tracks what's being asked and why. The result: consistent, auditable, and provable reasoning--bringing structure, oversight, and trust to LLM-driven workflows.

What Makes Jaxon Different?

Human First: Our leaders get it -- life happens. Whether it's flexible working hours or understanding the need for a mental health day, we prioritize your well-being. Our CEO recognizes that happy people do the best work and he backs that up with an empathetically led culture - and I'm not just saying that because he's going to read this post.
Laid-Back Culture: We know that the best work comes from a place of comfort and authenticity. At Jaxon, you're encourage

About the Company

Jaxon is making AI trustworthy. AI has a propensity to fabricate responses - Jaxon keeps them honest. Based on R&D with the US Department of Defense, Jaxon's proprietary DSAIL (Domain-Specific AI Language) technology applies formal constraints and assertions to verify responses. Now AI can be relied upon for mission-critical applications. Know more