- Company Name
- Tranzeal Incorporated
- Job Title
- Principal AI Platform Engineer
- Job Description
-
Job Title: Principal AI Platform Engineer
Role Summary:
Lead the design, development, and operation of highly scalable AI platform services, focusing on large‑model (LLM) and autonomous agent (ReAct) capabilities. Drive end‑to‑end platform architecture, ensuring robust API, observability, and data‑management for production workloads.
Expectations:
- 8+ years of experience in AI/ML platform engineering or equivalent.
- Proven track record delivering production‑grade LLM/agent services with high availability.
- Strong ownership of scalability, reliability, and performance for distributed systems.
- Demonstrated leadership in shaping architecture and technical standards.
Key Responsibilities:
- Architect and implement LLM/Agent workloads using psycho modeling and prompt engineering, optimizing token usage and inference throughput.
- Design and develop high‑performance REST/gRPC APIs (including OpenAPI/Proto contracts), versioning, and event‑driven integration with streaming (SSE) and gRPC.
- Build and maintain distributed streaming solutions (MCP client‑server, SSE) and event pipelines.
- Integrate observability with OpenTelemetry, distributed tracing, Prometheus, and alerting.
- Manage vector‑search and PostgreSQL storage for embeddings, ensuring high query performance.
- Oversee deployment and CI/CD pipelines on GCP (BigQuery, GCS), Kubernetes, and Cloud Cost Management.
- Mentor and guide engineering teams on best practices in async programming, micro‑services, and model serving.
Required Skills:
- Advanced Python (async/await, FastAPI, DSPy).
- LLM/Agent expertise: ReAct patterns, prompt engineering, token optimization.
- Distributed systems design (MCP, event‑driven architecture, streaming).
- API design: REST, gRPC, contract‑first, versioning.
- Observability: OpenTelemetry, tracing, Prometheus.
- Data: PostgreSQL, vector search/embeddings.
- Cloud & infra: GCP services, Kubernetes, container orchestration, cost‑management practices.
Required Education & Certifications:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
- Certifications in cloud (e.g., GCP Professional Cloud Architect) and distributed systems are a plus.