Data Analyst
Resume:
Soma Meghana Prathipati
# Los Angeles, CA 213-***-**** # *************@*****.*** ï linkedin.com/in/soma-meghana-p-/
EXPERIENCE
University of Southern California May 2024 – Aug 2024 Research Assistant(ML Engineer) Python, OpenCV, TensorFlow,
PyTorch, XGBoost, Apache Spark Los Angeles, CA
• Engineered a semi-supervised void detection model by integrating K-Means clustering with local XGBoost
classifiers, improving defect classification accuracy by 32% and reducing false positives
• Optimized deep learning segmentation for COSB micro-CT scans, implementing FCN, U-Net, and SegNet
architectures, achieving 28% higher IoU and 35% better boundary precision over traditional methods
• Developed an automated image preprocessing pipeline using bilateral filtering, Sobel edge detection, and grayscale
thresholding, improving void feature clarity by 25% and reducing manual labeling effort by 40% Accenture Dec 2019 –
Dec 2022
Data Analyst Tableau, SQL, Python, Spark, AWS Glue, XGBoost Hyderabad, India
• Created interactive dashboards in Tableau to track sales performance across 15 regions, increasing regional sales
by 8%
• Conducted A/B testing on email marketing campaigns using SQL and Python, analyzing 300K+ customer interactions
to boost CTR by 15% and conversion rates by 10%
• Built and optimized ETL pipelines using Apache Spark and AWS Glue to process 5M+ daily customer records,
reducing processing time and enabling faster insights for stakeholders
• Developed a predictive maintenance model using XGBoost and time series analysis on product performance data,
achieving 85% accuracy and reducing unplanned downtime by 20% SKILLS
Programming: Python (Pandas, NumPy, SciPy, Scikit-learn, PyTorch, Keras, Tensorflow), PostgreSQL, MySQL, C, Java
Data Science: A/B Testing, Statistical Modeling, Data Wrangling, Hypothesis Testing, Time Series Analysis, Predictive
Analytics, ETL, Data Warehousing, Database Optimization, Automation, Airflow,Regression, Classification, Clustering,
Anomaly Detection, Feature Engineering, Reinforcement Learning, Recommendation Systems, Generative AI Data
Visualization: Tableau, Power BI, Looker, Excel, Google Data Studio, DAX, Matplotlib, Seaborn, Plotly Big Data and
Cloud: Apache Spark, Hadoop, AWS (Redshift, S3, Lambda), Azure, GCP, Kafka, MongoDB EDUCATION
University of Southern California Jan 2023 – Dec 2024 Master of Science in Applied Data Science, GPA: 3.7/4 Los
Angeles, CA Jawaharlal Nehru Technological University Aug 2016 – Aug 2020 Bachelor of Technology in Electronics and
Communication Engineering, GPA: 9.6/10 Hyderabad, India PROJECTS
Virtual Teaching Assistant Python, LangChain, Google Gemini, OpenAI, FAISS, RAG, NLP Apr 2024 – May 2024
• Developed a Retrieval-Augmented Generation (RAG) pipeline using FAISS, vector embeddings, and LLM-based
contextual retrieval, improving answer accuracy by 32% for academic queries
• Integrated Google Gemini and OpenAI GPT-4 with LangChain, enabling multi-turn, LLM-powered conversational AI
with memory persistence, reducing student query resolution time by 40%
• Deployed a web-based AI assistant using Streamlit, integrating LLM-driven text, voice, and document interaction,
increasing student engagement by 50% and reducing unanswered queries Hybrid Recommendation System Python,
Sign in
PySpark, XGBoost, Scikit-learn, Pandas Nov 2023 – Dec 2023
• Designed a hybrid recommendation system combining item-based collaborative filtering and XGBoost, leveraging
PySpark RDDs for efficient processing of 10M+ user interactions, achieving a test RMSE of 0.92 and improving
recommendation accuracy by 20%
• Optimized model performance using GridSearch and 5-fold cross-validation, reducing processing time by 30% and
achieving a 15% higher user engagement rate compared to baseline models Twitter Sentiment Analysis Python,
PyTorch, TensorFlow, Hugging Face Transformers, NLTK Jun 2023 – Jul 2023
• Developed a sentiment analysis pipeline using BERT (PyTorch) and LSTM (TensorFlow), achieving 95% accuracy on
500K+ tweets through pre-trained embeddings and transfer learning
• Fine-tuned BERT on domain-specific Twitter data, improving F1-score by 12% compared to traditional models
• Improved data quality by removing 35% of noise through custom text preprocessing, resulting in a 10% increase in
model precision for negative sentiment detection
CERTIFICATIONS
AWS Certified Machine Learning Specialty, Hugging Face NLP Certification, Microsoft Certified: Azure Data Engineer
Associate (DP-203)