Bright Vision Technologies is seeking a skilled GPU Systems Engineer (CUDA) to join their dynamic team and contribute to their mission of transforming business processes through technology.
Requirements
- Design and implement high-performance CUDA kernels for compute-intensive workloads across AI and HPC use cases.
- Profile and optimize GPU code using tools such as Nsight Systems, Nsight Compute, and CUDA profilers.
- Tune memory access patterns, occupancy, register usage, and shared memory utilization for peak performance.
- Develop highly optimized libraries for linear algebra, attention, and other ML primitives.
- Optimize multi-GPU and multi-node training using NCCL, RDMA, and high-performance networking.
- Implement custom operators and fused kernels in PyTorch, JAX, or Triton.
- Collaborate with ML engineers to identify performance bottlenecks in training and inference pipelines.
- Develop benchmarks and regression tests to safeguard performance over time.
- Evaluate new GPU architectures and feature sets, and advise on adoption strategy.
- Contribute to compiler-level optimizations for tensor programs where appropriate, working at the boundary between ML frameworks and underlying accelerator codegen to unlock performance not reachable through framework-level tuning alone.
- Optimize memory hierarchy usage across HBM, L2, shared memory, and registers.
- Implement mixed-precision and quantized compute paths that maximize accelerator throughput while preserving numerical fidelity within bounds acceptable for the target workloads.
- Document performance characteristics, design decisions, and tuning playbooks for internal teams.
- Stay current with GPU architecture, CUDA evolution, and emerging accelerator technologies.
Benefits
- Competitive base salary commensurate with experience, plus benefits.