SAUMYA SHAH
*************@*****.*** • 517-***-**** • https://www.linkedin.com/in/5hah5aumya/ • New York City, NY Enabling AI transformation to optimize businesses, supply chains, and manage risks, by delivering impactful data-driven solutions. Harnessing much demanded data mining, analytics, visualization, modeling, and MLOps skills throughout the product life cycle. WORK EXPERIENCE
Data Scientist Meijer Inc., East Lansing, MI Jan 2024 – May 2024
Enhanced service-levels of 4 distribution facilities (catering to 30 stores) by optimizing safety stocks, on-shelf-availability, warehousing, and ordering costs, preventing stock-out losses worth $ 200K for May 2024 (Referenced paper: King et al.)
Setup PySpark ETL pipeline to fetch and apply transformations on SAP HANA real-time data stream while carrying out 6 statistical tests (F-test, Chi-Square, ADF, ACF, PACF, Granger causality) using Spark MLlib and RDDs
Forecasted safety stocks for top 18 revenue driving stock-out vulnerable product-vendor groups, by building an ensemble of: Bayesian model (demand uncertainty), Poisson model (short-term sales peaks), and Facebook Prophet (baseline)
Applied SAS Viya to provide insights, study interaction term effects (weather, fuel price, convenience) on model forecasts Data Analyst VERN.ai, East Lansing, MI May 2023 – Aug 2023
Integrated GPT 2 transformer into PySpark cluster for user pro ling, and analyzing customer queries and purchase patterns
Monitored chatbot performance (tableau), deployed feedback mechanism to achieve 76% utterance-intent-context match
Reduced database (30+ entities) query response time from 2 min to 15 sec, by executing PL/SQL triggers and cursors Research Assistant Michigan State University, East Lansing, MI Sep 2022 – Dec 2022
Collaborated at Imaging and Deep learning lab, team of 20+ researchers, dealing with disease detection, structural analysis
Migrated Raman spectral simulation scripts from MATLAB to Python, implementing clustering (GMM) on 1133 compounds
Trained a CNN-based autoencoder (Keras) to de-noise spectral images, leading to an F1-score of 87% Associate Analyst XcelTec Interactive Private Limited, Ahmedabad, India Apr 2019 – Aug 2022
Generated product recommendations for e-commerce app using collaborative ltering, associate rule mining, and customer review sentiment analysis, achieving a scale of 5000+ products, 80+ categories, and increased revenue by $ 60K
Automated ML pipelines, populating supply chain KPIs through AWS SageMaker, Step functions, and Lambda triggers
Developed and maintained merchant app (admin) with HTML, CSS, Js, Django, MongoDB, NGinX, Docker, and AWS EC 2
Facilitated communications among stakeholders, sales, and engineering team using MS Office (Excel, PPT, Word) and Jira PROJECTS
Property investment management Jun 2024
Participated in 7-day hackathon, organized by Jain Alert group (US), got selected for in-person global round in New York City
Scraped property, nancial, and market data (climate risk, crime stats, insurance, transit, capital spending, amenities, etc.) for 1500+ households in real-time, asynchronously using Selenium scripts and kafka data stream
Staged raw data into MongoDB instance, later cleaned, transformed, and stored in MySQL database for react UI engine to access. This data abstraction (2-layer system architecture) inherently provided security from injection or backdoor attacks
Ranked neighborhoods based on derived metrics and user data (credit score, job designation, down payment exibility, age, preferences, liabilities), generating custom investment portfolio/report ( nancial underwriting) for listings in NJ (070**-*****) PowerTree Apr 2022
Engaged with PDEU and Indian Meteorological Department in research on efficient forecasting of Solar Irradiance
Pitched 5 module PoC, facilitating informed decisions, optimizing investment strategies, ensuring regulatory compliance
Leveraged ArcGIS, and satellite imagery, to simplify land potent identi cation, and scaling up power generation (0.5 MW)
IEEE TRIBES conference publication DOI: 10.1109/TRIBES52498.2021.9751626 TECHNICAL SKILLS
Programming languages: Python SQL R JavaScript C++ Matlab Frameworks and Packages: PySpark TensorFlow Hugging Face Django FastAPI MongoDB Selenium Kafka Tools: Prompt engineering (LLMs) AWS Docker Hadoop Tableau SAP Git Jenkins SAS Latex Certi cations: IBM Data Science NPTEL Deep Learning 100 days of ML Faculty recommendations EDUCATION
Master of Science in Data Science Michigan State University Statistical modeling, Supply Chain management, Data Mining, Deep learning in Finance, NLP, Big Data analytics, Machine learning Bachelor of Science in Computer Engineering Pandit Deendayal Energy University Data Structures and Algorithms, DBMS, Web Development, Insurance theory, Derivative markets, Hedging