Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. The AI Inference Engineer will port AI models to Quadric platform, optimize model deployment for efficient inference, and profile and benchmark model performance.
Requirements
- Quantize, prune and convert models for deployment
- Port models to Quadric platform using Quadric toolchain
- Optimize inference deployment for latency, speed
- Benchmark and profile model performance and accuracy
- Collaborate across related areas of the AI inference stack to support team and business priorities
- Develop tools to scale and speed up the deployment
- Make Improvement to SDK and runtime
- Provide technical support and documents to customers and developer community
Benefits
- Competitive salary and meaningful equity
- Medical, dental, and vision plans starting on day one
- 401(k) retirement plan
- Flexible paid time off (unlimited, non-accrual) to support work-life balance