Bright Vision Technologies is seeking a skilled GPU Systems Engineer (CUDA) to join its dynamic team and contribute to its mission of transforming business processes through technology. The ideal candidate will have deep expertise in CUDA programming, GPU architecture, and high-performance computing to design and optimize compute-intensive workloads on modern accelerator hardware.
Requirements
- Design and implement high-performance CUDA kernels for compute-intensive workloads across AI and HPC use cases
- Profile and optimize GPU code using tools such as Nsight Systems, Nsight Compute, and CUDA profilers
- Tune memory access patterns, occupancy, register usage, and shared memory utilization for peak performance
- Develop highly optimized libraries for linear algebra, attention, and other ML primitives
- Optimize multi-GPU and multi-node training using NCCL, RDMA, and high-performance networking
- Implement custom operators and fused kernels in PyTorch, JAX, or Triton
- Collaborate with ML engineers to identify performance bottlenecks in training and inference pipelines
- Develop benchmarks and regression tests to safeguard performance over time
- Evaluate new GPU architectures and feature sets, and advise on adoption strategy
- Contribute to compiler-level optimizations for tensor programs where appropriate
- Optimize memory hierarchy usage across HBM, L2, shared memory, and registers
- Implement mixed-precision and quantized compute paths that maximize accelerator throughput while preserving numerical fidelity
- Document performance characteristics, design decisions, and tuning playbooks for internal teams
- Stay current with GPU architecture, CUDA evolution, and emerging accelerator technologies
Benefits
- Competitive base salary commensurate with experience, plus benefits