Data Scientist

Location:

Los Angeles, CA

Salary:

100000

Posted:

April 13, 2025

Contact this candidate

Resume:

Data Analyst

Resume:

Soma Meghana Prathipati

# Los Angeles, CA 213-***-**** # *************@*****.*** ï linkedin.com/in/soma-meghana-p-/

EXPERIENCE

University of Southern California May 2024 – Aug 2024 Research Assistant(ML Engineer) Python, OpenCV, TensorFlow,

PyTorch, XGBoost, Apache Spark Los Angeles, CA

• Engineered a semi-supervised void detection model by integrating K-Means clustering with local XGBoost

classifiers, improving defect classification accuracy by 32% and reducing false positives

• Optimized deep learning segmentation for COSB micro-CT scans, implementing FCN, U-Net, and SegNet

architectures, achieving 28% higher IoU and 35% better boundary precision over traditional methods

• Developed an automated image preprocessing pipeline using bilateral filtering, Sobel edge detection, and grayscale

thresholding, improving void feature clarity by 25% and reducing manual labeling effort by 40% Accenture Dec 2019 –

Dec 2022

Data Analyst Tableau, SQL, Python, Spark, AWS Glue, XGBoost Hyderabad, India

• Created interactive dashboards in Tableau to track sales performance across 15 regions, increasing regional sales

by 8%

• Conducted A/B testing on email marketing campaigns using SQL and Python, analyzing 300K+ customer interactions

to boost CTR by 15% and conversion rates by 10%

• Built and optimized ETL pipelines using Apache Spark and AWS Glue to process 5M+ daily customer records,

reducing processing time and enabling faster insights for stakeholders

• Developed a predictive maintenance model using XGBoost and time series analysis on product performance data,

achieving 85% accuracy and reducing unplanned downtime by 20% SKILLS

Programming: Python (Pandas, NumPy, SciPy, Scikit-learn, PyTorch, Keras, Tensorflow), PostgreSQL, MySQL, C, Java

Data Science: A/B Testing, Statistical Modeling, Data Wrangling, Hypothesis Testing, Time Series Analysis, Predictive

Analytics, ETL, Data Warehousing, Database Optimization, Automation, Airflow,Regression, Classification, Clustering,

Anomaly Detection, Feature Engineering, Reinforcement Learning, Recommendation Systems, Generative AI Data

Visualization: Tableau, Power BI, Looker, Excel, Google Data Studio, DAX, Matplotlib, Seaborn, Plotly Big Data and

Cloud: Apache Spark, Hadoop, AWS (Redshift, S3, Lambda), Azure, GCP, Kafka, MongoDB EDUCATION

University of Southern California Jan 2023 – Dec 2024 Master of Science in Applied Data Science, GPA: 3.7/4 Los

Angeles, CA Jawaharlal Nehru Technological University Aug 2016 – Aug 2020 Bachelor of Technology in Electronics and

Communication Engineering, GPA: 9.6/10 Hyderabad, India PROJECTS

Virtual Teaching Assistant Python, LangChain, Google Gemini, OpenAI, FAISS, RAG, NLP Apr 2024 – May 2024

• Developed a Retrieval-Augmented Generation (RAG) pipeline using FAISS, vector embeddings, and LLM-based

contextual retrieval, improving answer accuracy by 32% for academic queries

• Integrated Google Gemini and OpenAI GPT-4 with LangChain, enabling multi-turn, LLM-powered conversational AI

with memory persistence, reducing student query resolution time by 40%

• Deployed a web-based AI assistant using Streamlit, integrating LLM-driven text, voice, and document interaction,

increasing student engagement by 50% and reducing unanswered queries Hybrid Recommendation System Python,

PySpark, XGBoost, Scikit-learn, Pandas Nov 2023 – Dec 2023

• Designed a hybrid recommendation system combining item-based collaborative filtering and XGBoost, leveraging

PySpark RDDs for efficient processing of 10M+ user interactions, achieving a test RMSE of 0.92 and improving

recommendation accuracy by 20%

• Optimized model performance using GridSearch and 5-fold cross-validation, reducing processing time by 30% and

achieving a 15% higher user engagement rate compared to baseline models Twitter Sentiment Analysis Python,

PyTorch, TensorFlow, Hugging Face Transformers, NLTK Jun 2023 – Jul 2023

• Developed a sentiment analysis pipeline using BERT (PyTorch) and LSTM (TensorFlow), achieving 95% accuracy on

500K+ tweets through pre-trained embeddings and transfer learning

• Fine-tuned BERT on domain-specific Twitter data, improving F1-score by 12% compared to traditional models

• Improved data quality by removing 35% of noise through custom text preprocessing, resulting in a 10% increase in

model precision for negative sentiment detection

CERTIFICATIONS

AWS Certified Machine Learning Specialty, Hugging Face NLP Certification, Microsoft Certified: Azure Data Engineer

Associate (DP-203)

Contact this candidate