Together.ai is building infrastructure to enable efficient and scalable inference for large language models (LLMs). They are seeking an Inference Frameworks and Optimization Engineer to design, develop, and optimize distributed inference engines for high-performance serving. The role focuses on low-latency, high-throughput inference, GPU/accelerator optimizations, and software-hardware co-design.

Requirements

3+ years of experience in deep learning inference frameworks, distributed systems, or high-performance computing.
Familiar with at least one LLM inference frameworks (e.g., TensorRT-LLM, vLLM, SGLang, TGI(Text Generation Inference)).
Background knowledge and experience in at least one of the following: GPU programming (CUDA/Triton/TensorRT), compiler, model quantization, and GPU cluster scheduling.
Deep understanding of KV cache systems like Mooncake, PagedAttention, or custom in-house variants.
Proficient in Python and C++/CUDA for high-performance deep learning inference.
Strong analytical problem-solving skills with a performance-driven mindset.
Excellent collaboration and communication skills across teams.

Benefits

Competitive compensation
Startup equity
Health insurance

Requirements

3+ years of experience in deep learning inference frameworks, distributed systems, or high-performance computing.
Familiar with at least one LLM inference frameworks (e.g., TensorRT-LLM, vLLM, SGLang, TGI(Text Generation Inference)).
Background knowledge and experience in at least one of the following: GPU programming (CUDA/Triton/TensorRT), compiler, model quantization, and GPU cluster scheduling.
Deep understanding of KV cache systems like Mooncake, PagedAttention, or custom in-house variants.
Proficient in Python and C++/CUDA for high-performance deep learning inference.
Strong analytical problem-solving skills with a performance-driven mindset.
Excellent collaboration and communication skills across teams.

Benefits

Competitive compensation
Startup equity
Health insurance

LLM Inference Frameworks and Optimization Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

LLM Inference Frameworks and Optimization Engineer

Machine Learning Engineer - Inference

Machine Learning Engineer

LLM Inference Frameworks and Optimization Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

LLM Inference Frameworks and Optimization Engineer

Machine Learning Engineer - Inference

Machine Learning Engineer

Job Details

About Together AI