Senior Data Scientist role involves data science-related research and software application development and engineering duties related to AI Datacenter technology and autonomous platform. The role requires collaboration with other engineers to build the next generation of autonomous Datacenter networks leveraging big data and predictive models. The Senior Data Scientist will leverage the data collected from the network to empower the inference engine of our Mist platform and systems, including the Mist virtual assistant chat bot. The role also involves developing and implementing scalable algorithms to process a large amount of streaming data to detect anomalies, predict problems, provide Root Cause Analysis (RCA) and classify them in real-time.
Requirements
- Design and implement machine learning solutions which require to process terabytes of streaming data to detect anomalies in DC networks of our customers, predict problems and future trends, provide Root Cause Analysis (60%)
- Solid statistics and math background, good knowledge of machine learning methods like k-Nearest Neighbors, Naive Bayes, SVM, Decision Forests.
- Excellent Communication Skills to articulate observations and use cases with PM and network domain experts who are not experienced in AI/ML through data visualization tool.
- Have done time series data analysis, forecasting and correlation is preferrable.
- Have utilized latest AI/ML techniques, such as Neural Networks, Transformer, etc. for time series data or interested to explore these techniques for time series data.
- Analyze feature requirements from product manager, collaborate with engineers and data scientists to design the solutions.
- Require good understanding of datacenter networking topology and protocols.
- Troubleshoot production environment and customer reported issues (20%)
- Require the knowledge of the multi-cloud production environment
- Require the agility to troubleshoot open-source data processing engine, such as Apache Spark, Apache Storm and Apache Flink
- Utilize analytical and programming skills and open-source systems, such as Hadoop, Hive, Spark, Elasticsearch, Redis, etc. develop data processing pipeline required efficacy and latency (20%)
- Require good knowledge and experience of the big data tool sets and techniques of distributed storage and computation engine
- Require the experience to develop the reusable and highly scalable data processing component
- Require good knowledge and experience to work with cloud based CICD tools and cloud devops teams to collect stats and create monitors for our data processing pipelines
- Require good understanding of MCPs and Agentic frameworks.
Benefits
- Health & Wellbeing
- Personal & Professional Development
- Unconditional Inclusion