Sangam Man Buddhacharya
Corvallis, OR, USA
541-***-**** ******************@*****.*** linkedin.com/in/sangambuddhacharya sanbuddhacharyas.github.io
4+ years in the field of data science, helped 8+ companies boost their business using data.
WORK EXPERIENCE
Ramsey Lab - Graduate Research Assistant (Data Scientist); Corvallis, USA September 2023 - Present Drug Concentration Prediction in Saliva
• Reduced percent error from 35% to 16% by performing signal processing on the voltammogram data, extracting 13 key features, and feature engineering using scikit-learn in Python, which led to securing funding for 3 years from the National Institutes of Health (NIH).
• Improved voltammogram dataset quality via exploratory data analysis (EDA), detecting negative peaks and outliers through box plot visualization and IForest algorithm; boosted model performance from 0.87 to 0.89 R 2 score.
• Verified statistical superiority of SVM model over Gaussian process regression, achieving 7% lower error rate with statistical significance (p 0.003) using paired permutation testing.
Artlabs - Data Scientist; New York, USA April 2022 - January 2023 Vehicle Transport Price Forecasting System
• Refined dataset using SQL, engineered critical features, and leveraged AutoML to fine-tune an XGBoost model for predicting vehicle transportation costs across U.S. states, improving MAPE from 25% to 22.5% by incorporating population density of source and destination locations, led to model’s approval for deployment on Google Cloud. Music Recommendation
• Created a music recommendation system for TrakTrain by integrating an end-to-end pipeline with Recombee, including data collection, preprocessing, feature engineering, and time series analysis with TSFEL, Pandas, and Scikit-Learn; optimized top-10 similarity accuracy from 40% to 63%, boosting membership by 5% within 3 months.
• Applied an autoencoder to compress audio features by 50%, lowering cloud storage costs by $2,500 per month. Stock Market Twitter Feeds Mining.
• Led an entire project, including deploying, data cleaning, and designing a text classifier for NER to build a Twitter feed mining system to provide customized news on selected stocks implementing Pandas, Spark SQL, and SpaCy.
• Boosted F1-score from 0.75 to 0.83 by replacing the SpaCy NER model with a BERT model fine-tuned on Twitter feeds, increasing user engagement from 1,500 to 2,300 within the first month of launch. Selcouth Technology - Data Scientist; Chitwan, Nepal September 2021 - April 2022 Video to Online Shopping (Clothes Ads Recommendation)
• Built and deployed a deep learning pipeline in TensorFlow on AWS EC2 to recommend online clothing (Flipkart) ads by matching similar clothes from short reels, boosting sales of t-shirts by 9% and jeans pants by 6%.
• Led a 7-member team, optimized image retrieval accuracy by 7 percent, maximized approval rate for the Sharechat (MoJ) client from 51% to 83%, and aided support in raising funding of $100,000 within 6 months.
• Streamlined video processing by enforcing an automatic frame selection and main character tracking algorithm with OpenCV and Keras, decreasing processing time by 40% and GPU operating costs by 15%. Deerwalk - Associate Data Engineer; Kathmandu, Nepal May 2021 - August 2021 Analysis of Medical Data from U.S. Hospitals
• Automated data loading from the MySQL server and data transformation process by creating a Python script using Pandas and RegEx, simplifying manual effort and enhancing data cleaning efficiency by 20%.
• Collaborated with a team of 8 to analyze healthcare data employing SQL, performed KNN clustering in Python, and visualized data leveraging Tableau.
PROJECTS
Washington House Sales Dashboard, Personal (Visualization) December 2024 - January 2025
• Designed an interactive Tableau dashboard to analyze the impact of pricing trends in Washington, showcasing insights on key features such as house age, location, and number of bedrooms, enabling data-driven decision-making for real estate strategies. Sports Tennis Game Classification Using Deep Learning, ProTracker (Freelancing) October 2022 - February 2023
• Analyzed tennis games by calculating player positions, tracking ball, identifying bounce points, and predicting backhand, forehand, and serve strokes using deep learning models, minimized manual analysis, observation, and data entry time in the ProTracker app, resulting in a 30% increase in efficiency
SKILLS
Knowledge: GAN, Transformer, CNN, CV algorithms, Data Analysis, IMU, Machine Learning, Deep Learning, LLM, Statistical test Programming Languages: SQL, Python, C, C++
Tools & Database: Cloud Computing, Apache Spark, MySQL, Tableau, AWS, Docker Libraries: Pandas, Numpy, Scikit-Learn, Matplotlib, Seaborn, sciPy Frameworks: TensorFlow, Keras, PyTorch, Flask, Streamlit Soft Skills: Fluent Communication, Analytical Thinking, Project Management, Problem Solver, Team Player
EDUCATION
Oregon State University, Corvallis, Oregon, USA September 2023 - June 2025 Dual Major in MS in Computer Science and Artificial Intelligence GPA: 3.97 / 4.0 Tribhuvan University, Institute of Engineering, Pulchowk Campus,, Lalitpur, Nepal November 2016 - April 2021 Bachelor in Electronics and Communication Engineering GPA: 3.97 / 4.0
MACHINE LEARNING PUBLICATIONS
Buddhacharya et al., “Evaluation of multi-feature machine-learning models for analyzing electrochemical signals for drug monitoring,” ACM, 2024.
Buddhacharya et al., “Fashion Image Retrieval based on Parallel Branched Attention Network,” IJACSA, 2022.
Buddhacharya et al., “CNN-Based Continuous Authentication of Smartphones Using Mobile Sensors (IMU),” IJIRAE, 2022.