Job Specifications
Role: Devops Engineer with LLM, GPU
Pay rate range: $70 - $80
Job Description
Required Skills
Deep experience building services in modern cloud environments on distributed systems (i.e., containerization (Kubernetes, Docker), infrastructure as code, CI/CD pipelines, APIs, authentication and authorization, data storage, deployment, logging, monitoring, alerting, etc.)
Experience working with Large Language Models (LLMs), particularly hosting them to run inference
Strong verbal and written communication skills. Your job will involve communicating with local and remote colleagues about technical subjects and writing detailed documentation.
Preferred Skills
Experience with building or using benchmarking tools for evaluating LLM inference for various models, engine, and GPU combinations.
Familiarity with various LLM performance metrics such as prefill throughput, decode throughput, TPOT, and TTFT
Experience with one or more inference engines: e.g., vLLM, SGLang, and Modular Max
Familiarity with one or more distributed inference serving frameworks: e.g., llm-d, NVIDIA Dynamo, and Ray Serve etc.
Experience with AMD and NVIDIA GPUs, using software like CUDA, ROCm, AITER, NCCL, RCCL, etc.
Knowledge of distributed inference optimization techniques - tensor/data parallelism, KV cache optimizations, smart routing etc.
What You'll Be Working On
Develop and maintain an inference platform for serving large language models optimized for the various GPU platforms they will be run on.
Work on complex AI and cloud engineering projects through the entire product development lifecycle (PDLC) - ideation, product definition, experimentation, prototyping, development, testing, release, and operations.
Build tooling and observability to monitor system health, and build auto tuning capabilities.
Build benchmarking frameworks to test model serving performance to guide system and infrastructure tuning efforts.
Build native cross platform inference support across NVIDIA and AMD GPUs for a variety of model architectures.
Contribute to open source inference engines to make them perform better on DigitalOcean cloud.
HCLTech is not the employer for this role. This work is contracted through a third-party whose employees provides services to HCLTech and/or its clients.
Candidates interested in applying for this Contract opportunity must have a valid work authorization to work in the United States. We do not accept agency resumes and are not responsible for any fees related to unsolicited resumes. Candidates who are currently employed by a client of HCLTech may not be eligible for consideration, as decided on an individualized basis depending upon business considerations.
The expected pay range for this contract assignment is shown above with the Job details. The exact pay rate will vary based on skills, experience, and location and will be determined by the third-party employer.
HCLTech is an equal opportunity employer, committed to providing equal employment opportunities to all applicants and employees regardless of race, religion, sex, color, age, national origin, pregnancy, sexual orientation, physical disability or genetic information, military or veteran status, or any other protected classification, in accordance with federal, state, and/or local law. Should any applicant have concerns about discrimination in the hiring process, they should provide a detailed report of those concerns to secure@hcltech.com for investigation.
About the Company
HCLTech is a global technology company, home to more than 220,000 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. Consolidated revenues as of 12 months ending Dece...
Know more