Professional Summary
Ruma is a hands-on technologist with extensive expertise in data engineering, analytics, and machine learning. With years of experience, she designs and builds scalable data systems, including data warehouses, cloud platforms, and analytics solutions. She has successfully led complex projects, delivering production-ready transactional systems, analytics tools, and machine learning applications.
A collaborative leader, Ruma works effectively with cross-functional and global teams to drive impactful, data-driven projects across various industries. Her focus is on delivering practical, innovative solutions that solve real-world problems.
Education and Continues Learning
Bachelors and Masters in mathematics and computer Science: IIT Kharagpur (1997-2002)
Business Analytics and Business Intelligence: Great Lakes (2017)
AWS Certified Machine Learning Specialty (2021), AWS Certified Data Analytics (2022), AWS Solution Architect Associate (2022), AWS Developer Associate (2022)
Google Cloud Professional Data Engineer (2021), Google Cloud Professional Machine Learning (2022), Google Professional DevOps (2022), Fundamentals-of-Accelerated-Data-Science-with-RAPIDS(2023)
Kaggle Notebook Expert. Sample Notebooks worked on: Text classification, Wine quality classifications, Time series forecasting, Deep learning based modeling for Heart Data, Sentiment analysis of disaster related tweets using advanced NLP, Tensorflow based classification for forest covered types
https://partner.cloudskillsboost.google/public_profiles/ec4272ad-abec-44ac-a739-730287a50bf5
The Complete Guide to Becoming a Software Architect
AI Strategy and Governance Coursera
Algorithms, Technologies, and Platforms
Algorithms: Predictive modeling, Logistics, XGBOOST, ARIMA, LSTM, CNN, RNN, Exponential Weighted moving average, Transformer, BERT, Computer vision, SVM, Bayesian Hyperparameter optimization, OpenCV, Face detection
Advanced Analytics and ML technologies: Tensorflow, Keras, R, Python, Spark, PySpark, Cloud Technology AWS/GCP, BigQueryML, Anaconda, AWS Sagemaker, Jupyterlab, Google Colab, Vertex AI, TFX, KubeFlow, Scikit Learn, Cloud ML APIs, Explainable AI, OpenCV, Face Recognition
Data, ELT/ETL, and Visualization Technologies: Oracle ERP, ETL pipeline design, Query optimization, Oracle Reports, PLSQL, SQL, BigQuery, Bigtable, DynamoDB, Dataflow, Data Fusion, Cloud Functions, Git, Github, Looker, LookML, Bitbucket, Agile/Jira, Confluence, ETL with Glue, Databrew, Athena, Hive, AWS Healthlake, FHIR, Snowflake, Databricks
Domain worked: Supply Chain, Oracle Financials, Purchasing, Order management, AR/AP/PO, General Ledger
Work Experience
Deloitte March 2022 – Current GCP Data and ML Architect
Designed and implemented end-to-end automated data pipelines using Google Cloud Storage, Cloud Functions, and Dataflow standard templates, ensuring seamless data ingestion, transformation, and delivery.
Leveraged Databricks SQL and PySpark in Databricks notebooks for comprehensive data analysis, enabling data-driven decision-making across teams.
Built and deployed an end-to-end Computer Vision model using MLflow on Databricks, streamlining model tracking, experimentation, and deployment workflows.
Developed a Streamlit application integrated with Databricks to visualize and display model predictions, enhancing accessibility and usability for stakeholders.
Designed and implemented a full-stack machine learning pipeline with a focus on MLOps best practices for fraud analytics, leveraging Google Cloud's Vertex AI Pipeline.
Built and fine-tuned Hugging Face BERT models using transfer learning and few-shot learning techniques, deploying them on AWS SageMaker's Bring Your Own Container (BYOC) infrastructure for scalable and efficient processing.
Spearheaded a 10-member team of cloud data analytics and machine learning professionals to design and deliver a cloud-native analytics and insights platform for one of the world’s largest commodity exchanges.
Acted as the lead data modernization and analytics architect, successfully migrating on-premise reference and transactional data warehouses to Google BigQuery on GCP.
Designed and implemented a scalable ELT migration pipeline, transferring data from Cloud SQL to Google Cloud Storage (GCS) and into BigQuery, supporting diverse file formats such as CSV and AVRO using customizable GCP workflows.
Built advanced analytics products leveraging BigQuery, including complex JSON, array, and struct data structures to deliver actionable insights.
Collaborated with cross-functional Agile teams to design, develop, test, deploy, and maintain technical solutions using DevOps best practices. Ensured adherence to industry-leading standards in data security, backup, disaster recovery, business continuity, and data archiving.
Led the design and development of an advanced analytics platform leveraging AWS Healthcare Analytics, Athena, QuickSight, and SageMaker to deliver actionable insights and scalable solutions.
Drove a Proof of Value (PoV) initiative focused on leveraging Generative AI (GenAI) for cross-sell strategies in the insurance domain. Designed end-to-end solutions using AWS and OpenAI to automate the first level of insurance cross-sell cold calls.
Gained extensive experience in the retail sector, designing and training machine learning models for customer segmentation and coupon recommendation systems. Utilized GenAI to create dynamic and personalized marketing content.
Conducted in-depth analysis of the end-to-end data journey, from diverse source systems to Google Cloud as the central data platform, with Oracle EPM as the final destination for data consumption. Mapped and evaluated the system architecture and its components to ensure seamless data flow and integration.
Excel at designing and managing end-to-end data workflows, integrating data from diverse sources such as system-of-record files, databases (Oracle, MySQL), and API-based systems. Following the Medallion architecture, I ensure seamless data ingestion, transformation, and storage in Google BigQuery for scalable and efficient analytics. My expertise lies in building robust data pipelines that deliver reliable data integration and accessibility for downstream consumption.
Executed a Proof of Concept (PoC) for a retail client, focusing on customer segmentation. This involved data and feature selection, cluster analysis, and presentation of insights. Additionally, developed a PoC for generating dynamic, personalized marketing content using Google’s GenAI Text Generation (Text Bison), Python, and LangChain.
Expert in crafting advanced and highly optimized SQL queries in BigQuery, with deep proficiency in window functions, complex data structures (e.g., structs, arrays, JSON), and performance tuning for large-scale datasets.
Designed and architected scalable data products and ETL/ELT pipelines, ensuring robust data integration, transformation, and delivery across enterprise systems.
Extensive experience in data modeling, including multi-dimensional modeling (e.g., star schema, snowflake schema) for data warehousing and analytics platforms, enabling efficient data storage and retrieval.
Proven track record as a seasoned Data Engineer, delivering end-to-end data solutions that align with business objectives, from data ingestion and transformation to visualization and consumption.
Skills used: Python, SQL, Big Data, BigQuery, Vertex AI, Sagemaker, Athena, Glue, Gen AI, LangChain,Prompt Engineering, etc.
Taking sabbatical to investigate new frontier of technology Feb 2020 – Feb 2022
Focused on cloud, data, and AI as the COVID accelerated these adoption.
Focused on key cloud platforms like AWS and GCP
Worked on various use cases, dataset, cloud native technologies to extend capabilities on modern technology.
AWS Machine Learning Certification and GCP data engineering certification
Worked on different Kaggle kernels and competitions.
GetKitch.In Sept 2019 – Feb 2020 Data Science and Machine Learning
A start up to bring social innovation in traditional cookware. Worked as data analyst and machine learning volunteer for this social initiative.
Work on analyzing data to identify potential revenue loss and new customer acquisition
Developing approach to analyze social media data to enhance customer reach and market expansion
Startup Incubation May 2019 – Sept 2019 Data Science and Machine Learning Architect
Smart Global Regulatory Compliance:
Changing regulations creates a dynamic problem for all industries. The platform was conceptualized to leverage power of machine learning to ease compliance cost for enterprise.
NLP pipeline design and implementation for classification and version changes across regulations in financial and insurance industry
System design for regulatory storage, analytics, ML, microservices, real time integration, and dashboarding
Pfizer: March, 2019 – APRIL, 2019 Machine Learning Analyst
Developed proof of concept for application of AI in industrial IoT
HVAC AHU Energy Optimization: I worked on the POC analyzing 3 AHU sensor readings. Visualizations in Python with Seaborn and Matplotlib to view the simultaneous heating and cooling in different months. Using ANN model to predict the SAT in the AHU based on various predictors like OAT/OArh/Cooling Inset Temperature etc.
Predictive Maintenance with Sensor/Tags readings: I worked on different classification models where based on the historical data with sensor readings and failures work order, can we finally build a predictive model where it will predict whether failure will happen in n number of days. Worked with RF, SVM, LSTM, ANN with Python and Keras
Cisco Systems Jan 2011 – Feb 2018 Data Science and ML Analyst
Supply Chain Operations/Transformations
Perform data acquisition, statistical and exploratory analysis in order to support Data Science and Advanced Analytics initiatives that improve products, processes or services related supply chain
Performed POC for mapping relational database into NoSQL(MongoDB) using Python to evaluate performance and scalability for analytics reporting and web applications.
Performed timeseries analysis with ARIMA and segmentation analysis using Kmeans Unsupervised Clustering to determine how different Cisco Products fit into the three categories of Runner/Repeater/Rogue in order to help in the inventory shelf management every quarter. Did extensive Exploratory Data Analysis using R. Helped in efficient inventory management for Cisco quarter by quarter.
Used Descriptive Analytics with R to find hidden patterns/trends in order cancellation using 2 years of Order Dataset. Text Analytics: Expedite and Escalation orders. Root cause the reasons for Expedited orders, Escalations, and Customer Satisfaction
Performed Data Extraction and Exploratory Data Analysis as well as Statistical Analysis with ANOVA to analyze and improve services across different contract manufacturers in terms of supply and demand.
Led and guided development team from vendor partner to ensure consistent quality and timely delivery.
Led Large Scale Service Bundle 1 (Supply Chain) program from initiation to production. Received Chairman award for exceptional business impact and leadership
Mentored and guided new joiners in relevant technical and functional skill to shorten skill gap and time to productivity
Led Informatica Data Lifecycle Management project for from initiation to production
Price Waterhouse Coopers Oct 2006 – Dec 2010 Data Analyst and Architect
Client: (Mc Donalds)
A core technical member of multi year, multi-million oracle financial implementation engagement
Led architecture and development of critical application components
Managed onsite-offshore technical collaboration, new member ramp-up and design review
Spearheaded usage of complex oracle apps features leading to high performance solution
Played a crucial role in onsite client engagement, managing customer functional analyst, and application acceptance testing till go live
Oracle India Jun 2005 – Sep 2006 Data Analyst Consultant
Solution analysis team for identifying source of functional and technical issue in integrated oracle product suit with iStore as primary focus
Owner of complete bug cycle - from bug analysis, solution identification and implementation
Worked with oracle global product suit development team in launching of next version and intermediate patches
Aztec Jan 2005 – Jun 2005 Data Analyst Engineer
Software development center supporting sales force automation for Major Pharmaceutical companies such as Glaxo, Procter and Gambel, Sanofi
Development of Datastage jobs for Extraction, Transformation and Loading data.
Designed scripts for creating tables and indexes and wrote stored procedures, functions, packages and triggers for high performance synchronization
Finch Software Jun 2004 – Dec 2004 Senior Software Engineer
Off shore development center to enhance and support live financial MultiFond application for Net Asset Value derivation
Problem simulation, enhancement of existing application and business logic, Pre-release testing and live support
Established configuration management practices and onsite-offshore communication plan
Mastek Ltd Aug 2002 – Jun 2004 Software Engineer
Developed simple to complex forms, reports, packages, procedures, functions and triggers
Transformed functional requirement to technical design and unit test specifications
Awards and Achievements
Disruptor award for implementing machine learning program in Cisco 2017
You inspire award for leading Informatica life cycle management project in 2015
IT Champion award for Risk Taking 2013
Star performer award 2010
National Talent Search Exam scholarship 1995