Data Scientist Machine Learning

Location:

Naperville, IL, 60540

Posted:

June 18, 2024

Contact this candidate

Resume:

Sai Ram Reddy

Senior Data Scientist / Machine Learning Engineer

Email: ******.*******@*****.***

Contact: +1-980-***-****

LinkedIn: www.linkedin.com/in/sai-ram-r

PROFESSIONAL SUMMARY:

Data Scientist with 10+ years of experience in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data and expertise working in a variety of industries including Finance, E-Commerce and Healthcare. s

Expert in Data Science process life cycle: Data Acquisition, Data Preparation, Modeling (Feature Engineering, Model Evaluation) and Deployment.

Equipped with experience in utilizing statistical techniques which include hypothesis testing, Principal Component Analysis (PCA), ANOVA, sampling distributions, chi-square tests, time-series analysis, discriminant analysis, Bayesian inference, multivariate analysis

Efficient in pre-processing data including Data cleaning, Correlation analysis, Imputation, Visualization, Feature Scaling and Dimensionality Reduction techniques using Machine learning platforms like Python Data Science Packages (Scikit-Learn, Pandas, NumPy).

Data Manipulation from *.csv, pipe delimited and xls file importing into a sql table via internal SaaS import tools.

Applied text pre-processing and normalization techniques, such as tokenization, POS tagging, and parsing. Expertise using NLP techniques (BOW, TF-IDF, Word2Vec) and toolkits such as NLTK, Genism, SpaCy.

Experienced in tuning models using Grid Search, Randomized Grid Search, K-Fold Cross Validation.

Strong Understanding with artificial neural networks, convolutional neural networks, and deep learning

Skilled in using statistical methods including exploratory data analysis, regression analysis,

regularized linear models, time-series analysis, cluster analysis, goodness of fit, Monte Carlo simulation, sampling, cross-validation, ANOVA, A/B testing, etc.

Expertise in building various machine learning models using algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Support Vector Machines (SVM), Decision trees, KNN, K-means Clustering, Ensemble methods (Bagging, Gradient Boosting).

Experience in Text mining, Topic modeling, Natural Language Processing (NLP), Content Classification, Sentiment analysis, Market Basket Analysis, Recommendation systems, Entity recognition etc.

Working experience in Natural Language Processing (NLP) and Deep understanding of Statistics/Linear Algebra/Calculus and various optimization algorithms like gradient descent.

Familiar with key data science concepts (statistics, data visualization, machine learning, etc.). Experienced in Python, MATLAB, SAS, PySpark programming for statistic and quantitative analysis.

Exposure to AI and Deep learning platforms such as TensorFlow, Keras, AWS ML

Experience working with Big Data tools such as Hadoop – HDFS and MapReduce, Hive QL, Sqoop, Pig Latin and Apache Spark (PySpark).

Extensive experience working with RDBMS such as SQL Server, MySQL, and NoSQL databases such as MongoDB, HBase.

Knowledge on Time Series Analysis using AR, MA, ARIMA, GARCH and ARCH model.

Experience in building production quality and large-scale deployment of applications related to natural language processing and machine learning algorithms.

Experience with high performance computing (cluster computing on AWS with Spark/Hadoop) and building real-time analysis with Kafka and Spark Streaming. Knowledge using Qlik, Tableau, and Power BI

Generated data visualizations using tools such as Tableau, Python Matplotlib, Python Seaborn, R.

Knowledge and experience working in Agile environments including the scrum process and used Project Management tools like ProjectLibre, Jira and version control tools such as GitHub/Git.

TECHNICAL SKILLS:

Data Sources

AWS snowflake, PostgresSQL, MS SQL Server, MongoDB, MySQL, HBase, Amazon Redshift, Databricks, Teradata.

Statistical Methods

Hypothesis Testing, ANOVA, Principal Component Analysis (PCA), Time Series, Correlation (Chi-square test, covariance), Multivariate Analysis, Bayes Law.

Machine Learning

Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, Random Forest, Support Vector Machines (SVM), K-Means Clustering, K-Nearest Neighbors (KNN), Random Forest, Gradient Boosting Trees, Ada Boosting, PCA, LDA, Natural Language Processing

Deep Learning

Artificial Neural Networks, Convolutional Neural Networks, RNN, Deep Learning on AWS, Keras API.

Hadoop Ecosystem

Hadoop, Spark, MapReduce, Hive QL, HDFS, Sqoop, Pig Latin

Data Visualization

Tableau, Python (Matplotlib, Seaborn), R(ggplot2), Power BI, QlikView, D3.js

Languages

Python (NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn), R, SQL, MATLAB, Spark, Java, C#

Operating Systems

UNIX Shell Scripting (via PuTTY client), Linux, Windows, Mac OS

Other tools and technologies

TensorFlow, Keras, AWS ML, NLTK, SpaCy, Gensim, MS Office Suite, Google Analytics, GitHub, AWS (EC2/S3/Redshift/EMR/Lambda/Snowflake)

Certifications

Deep Learning with Python from Data camp.

PROFESSIONAL EXPERIENCE:

Client: Capital One, Richmond, VA Jan 2023 - Present

Role: Senior Data Scientist/ Machine learning Engineer

Roles & Responsibilities:

Worked on end-to-end machine learning workflow, written python code for gathering the data from AWS snowflake, data pre-processing, feature extraction, feature engineering, modeling, evaluating the model, deployment. Written python code for exploratory data analysis using Scikit-learn machine learning python packages- NumPy, Pandas, Matplotlib, Seaborn, stats models, pandas profiling.

Worked on producing Monthly and Daily reports in Power BI.

Plan and deploy Power BI in the organization.

Deployed Azure IaaS virtual machines (VMs) and Cloud services (PaaS role instances) into secure Azure Virtual Networks and subnets.

Develop and maintain Tables, Views, and Queries in SQL server as data sources for Power BI reports.

Trained Random Forest algorithm on customer web activity data on media applications to predict the potential customers. Worked on Google Tensor Flow, Keras API- convolution neural networks for classification problems.

Written code for feature engineering, Principal component analysis PCA, hyper parameter tuning to improve the accuracy of the model.

Designed VNets and subscriptions to confirm to Azure Network Limits.

Utilized Databricks for data processing, and machine learning tasks, leveraging its collaborative environment and optimized Apache Spark performance.

Worked on various machine learning algorithms like Linear regression, logistic regression, Decision trees, random forests, K- means clustering, Support vector machines, XGBoosting on client requirements.

Worked on natural language processing for documentation classification, text processing using NLTK, SPACY, TextBlob to find the sensitive information in the electronically stored files and text summarization.

Developed the Python automation script for consuming the Data subjects request from AWS snowflake tables and post the data to adobe analytics privacy API.

Implemented machine learning and predictive analytics using Databricks and Spark on AWS, optimizing the performance and scalability of SaaS applications.

Leveraged AWS cloud services, including EC2, S3, Lambda, and Redshift, to build scalable, cloud-based machine learning models and data processing pipelines, ensuring efficient handling of SaaS environments.

Experience in working with deep learning framework Tensorflow, Keras and Pytorch.

Proficiency in R (e.g. ggplot2, cluster, dplyr, caret), Python (e.g. pandas, Keras, Pytorch, NumPy, scikitlearn, bokeh, nltk), Spark - MLlib, H20, or other statistical tools.

Developed the python script to automate the data cataloging in Alation data catalog tool. Tagged all the Personally identified Information (PII) data in Alation enterprise data Catalog tool, to identify the sensitive consumer information.

Developed machine learning models using recurrent neural networks – LSTM for time series, predictive analytics.

Built a robust Time Series Forecasting model to predict the sales of Altice for the next 6 months using Long Short-Term Memory (LSTM).

Developed machine learning models using Google TensorFlowkeras API Convolution neural networks for Classification problems, fine-tuned the model performance by adjusting the epochs, bath size, Adam optimizer.

Collaborated with cross-functional teams to integrate Databricks workflows into the overall data science pipeline, ensuring seamless data flow and analysis.

Consumed the Adobe analytics web API and written the python script to get the adobe consumer information for digital marketing into snowflake. Worked on Adobe analytics ETL jobs.

Written stored procedures in AWS snowflake to look for sensitive information across all the data sources and hash the sensitive data with salt value to anonymize the sensitive information to meet the CCPA law.

Containerized all the Ticketing related applications - SpringBoot Java and Node.Js applications using Docker.

Created a Continuous Delivery process to include support building of Docker Images and publish into a private repository- Nexus v3.

Worked on AWS boto3 API to make the HTTP calls to AWS amazon web services like S3, AWS secrets manager, AWS SQS.

Created an integration to consume the HBO consumer subscription information posted to AWS SQS- simple queue services and loaded into Snowflake tables for data processing, stored the meta data information into Postgres tables.

Worked on generating the reports to provide the warner media brands consumer information to data subjects through python automation jobs.

Migrated SQL Server database to Windows Azure SQL database and updated Connection Strings.

Implemented AWS lambda functions, python script that pulls the privacy files from AWS S3 buckets to post to it the Malibu data privacy endpoints.

Involved in different phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver solutions.

Worked with Python NumPy, SciPy, Pandas, Matplot, Stats packages to perform dataset manipulation, data mapping, data cleansing and feature engineering. Built and analyzed datasets using R and Python.

Extracted the data required for building models from AWS snowflake Database. Performed data cleaning including transforming variables and dealing with missing value and ensured data quality, consistency, and integrity using Pandas and NumPy.

Tackled highly imbalanced Fraud dataset using sampling techniques like under-sampling and over-sampling with SMOTE using Python Scikit-learn.

Utilized PCA and other feature engineering techniques to reduce the high dimensional data, applied feature scaling, handled categorical attributes using one hot encoder of Scikit-learn library.

Developed various machine learning models such as Logistic regression, KNN, and Gradient Boosting with Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python.

Elucidating the continuous improvement opportunities of current predictive modeling algorithms. Proactively collaborates with business partners to determine identified population segments and develop actionable plans to enable the identification of patterns related to quality, use, cost and other variables.

Technology Stack: Python, Postgres, AWS snowflake, Alation data catalog tool, snowsql, AWS EC2, S3, AWS lambda, AWS secrets manager, AWS SQS, Adobe analytics, Linux, Scikit-learn, SciPy,NumPy, Pandas, Matplotlib, Seaborn, JIRA, GitHub, Agile/ SCRUM.

Client: Amazon, Seattle, WA Jun 2020 – Dec 2022

Role: Senior Data Scientist

Roles & Responsibilities:

Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, model validation and visualization to deliver data science solutions.

Plan and deploy Power BI in the organization.

Configure and maintain Power BI service (app.powerbi.com) to share content with business users.

Publish the reports into Power BI service by creating and managing app workspaces and apps.

Built machine learning models to identify whether a user is legitimate using real-time data analysis and prevent fraudulent transactions using the history of customer transactions with supervised learning.

Extracted data from SQL Server Database, copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve the data required for building models.

Leveraged Databricks platform for scalable data processing and machine learning tasks, including data exploration, feature engineering, and model training on large datasets.

Collaborated with cross-functional teams to integrate Data bricks workflows into the overall data science pipeline, enhancing the seamless data flow and analysis within SaaS platforms.

Used AWS Sage Maker to quickly build, train and deploy the machine learning models.

Performed data cleaning including transforming variables and dealing with missing value and ensured data quality, consistency, integrity using Pandas, NumPy.

Used Jenkins pipelines to drive all microservices builds out to the Docker registry and then deployed to Kubernetes, Created Pods and managed using Kubernetes.

Building/Maintaining Docker container clusters managed by Kubernetes Linux, Bash, GIT, Docker, on GCP (Google Cloud Platform) . Utilized Kubernetes and Docker for the runtime environment of the CI / CD system to build, test deploy.

Developed Python automation scripts for consuming data subjects' requests from AWS Snowflake tables and posting data to Adobe Analytics Privacy API, streamlining the integration process within SaaS platforms.

Tackled highly imbalanced Fraud dataset using sampling techniques like under sampling and oversampling with SMOTE (Synthetic Minority Over-Sampling Technique) using Python Scikit-learn.

Utilized PCA, t-SNE and other feature engineering techniques to reduce the high dimensional data, applied feature scaling, handled categorical attributes using one hot encoder of scikit-learn library

Developed various machine learning models such as Logistic regression, KNN, and Gradient Boosting with Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python.

Responsible for setting up the continuous mobile delivery on Ionic Enterprise with Appflow.

Worked on Amazon Web Services (AWS) cloud services to do machine learning on big data.

Developed Spark Python modules for machine learning & predictive analytics in Hadoop.

Implemented a Python-based distributed random forest via PySpark and MLlib.

Used cross-validation to test the model with different batches of data to find the best parameters for the model and optimized, which eventually boosted the performance.

Developed and implemented Spark python modules for machine learning & predictive analytics in Databricks.

Snowflake data engineers will be responsible for architecting and implementing very large-scale data intelligence solutions around Snowflake Data Warehouse.

Responsible for Migration of key systems from on-premises hosting to Azure Cloud Services Snow SQL Writing: SQL queries against Snowflake.

Transformers design for renewable (solar/wind farm) energy grid tie application, grounding application, step-up application and TWACS application. Distribution class transformer design. stack core and wound core designs with RGO and amorphous material.

Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau.

Used GitHub and Jenkins for CI/CD (DevOps operations). Also familiar with Tortoise SVN, Bitbucket, JIRA, Confluence.

Technology Stack: Machine Learning, AWS, Databricks, Python (Scikit-learn, SciPy, NumPy, Pandas, Matplotlib, Seaborn, SQL Server, Hadoop, HDFS, Hive, Pig Latin, Apache Spark/PySpark/MLlib, GitHub, Linux, Tableau.

Client: WellCare, Tampa, FL Sep 2018 – May 2020

Role: Data Scientist/ Machine learning Engineer

Roles & Responsibilities:

Collaborated with data engineers and operation team to implement the ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.

Performed data analysis by retrieving the data from the Hadoop cluster.

Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.

Under the PMO Organization, led the company wide AGILE Transformation initiative in conjunction with AGILE Sponsor team represented by areas of Business, Technology, Marketing, HR, Compliance, IT etc.

As an overall AGILE transformation lead, developed implemented and promoted AGILE best practices and standards across the enterprise and AGILE teams. Drove the organization-wide AGILE adoption strategy and rollout plans. Provided solutions for scaling AGILE across projects, programs, and portfolios and improve Application delivery.

Explored and analyzed the customer specific features by using Matplotlib in Python and ggplot2 in R.

Performed data imputation using Scikit-learn package in Python.

Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.

Used Python (NumPy, SciPy, pandas, Scikit-learn, seaborn) and R to develop a variety of models and algorithms for analytic purposes.

Worked on Natural Language Processing with NLTK module of python and developed NLP models for sentiment analysis

Experimented and built predictive models including ensemble models using machine learning algorithms such as Logistic regression, Random Forests, and KNN to predict customer churn.

Conducted analysis of customer behaviors and discover the value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering. Gaussian Mixture Model and Hierarchical Clustering.

Used F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall evaluating different models’ performance.

Designed and implemented a recommendation system which leveraged Google Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend courses for different customers.

Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.

Technology Stack: Hadoop, HDFS, Python, R, Tableau, Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering/ Hierarchical Clustering/ Ensemble methods/ Collaborative filtering), JIRA, GitHub, Agile/ SCRUM, GCP

Client: Bluecoat Systems, Sunnyvale, CA Dec 2016 – Aug 2018

Role: Machine Learning Engineer

Roles & Responsibilities:

Communicated and coordinated with end client for collecting data and performed ETL to define the uniform standard format. Queried and retrieved data from Oracle database servers to get the dataset.

In the pre-processing phase, used Pandas to remove or replace all the missing data and balanced the dataset with Over-sampling the minority label class and Under-sampling the majority label class.

Used PCA and other feature engineering, feature scaling, Sickie-learn pre-processing techniques to reduce the high dimensional data using entire patient visit history, proprietary comorbidity flags and comorbidity scoring from over 12 million EMR and claims data.

Experimented with predictive models including Logistic Regression, Support Vector Machine (SVM), Gradient Boosting and Random Forest using Python Scikit-learn to predict whether a patient might be readmitted.

Designed and implemented Cross-validation and statistical tests including ANOVA, Chi-square test to verify the models’ significance.

Implemented, tuned and tested the model on AWS EC2 with the best performing algorithm and parameters.

Set up data preprocessing pipeline to guarantee the consistency between the training data and new coming data.

Deployed the model on AWS Lambda. Collected the feedback after deployment, retrained the model and tweaked the parameters to improve the performance.

Designed, developed and maintained daily and monthly summary, trending and benchmark reports in Tableau Desktop.

Used Agile methodology and Scrum process for project developing.

Technology Stack: AWS EC2, S3, Oracle DB, AWS, Linux, Python (ScikitLearn/NumPy/Pandas/Matplotlib), Machine Learning (Logistic Regression/Support Vector Machine/Gradient Boosting/Random Forest), Tableau.

Client: Resonous Technologies, India Nov 2013 – Sep 2016

Role: Programmer Analyst

Roles & Responsibilities:

Effectively communicated with the stakeholders to gather requirements for different projects

Used MySQL db package and Python-MySQL connector for writing and executing several MYSQL database queries from Python.

In data exploration stage used correlation analysis and graphical techniques in Matplotlib and Seaborn to get some insights about the patient admission and discharge data.

Implemented Client/Server applications using C#, JSP and SQL

Performed data imputation using Scikit-learn package in Python.

Created functions, triggers, views and stored procedures using My SQL.

Worked closely with back-end developer to find ways to push the limits of existing Web technology.

Involved in the code review meetings.

Environment: Python, MySQL, C#. Machine Learning Software Package, recommendation systems.

Education Details

Bachelors in Computer Science - GITAM university, 2013

Contact this candidate