Post Job Free
Sign in

Machine Learning Data Engineer

Location:
Lebanon, NJ
Posted:
March 04, 2025

Contact this candidate

Resume:

Professional Summary

Ruma is a hands-on technologist with extensive expertise in data engineering, analytics, and machine learning. With years of experience, she designs and builds scalable data systems, including data warehouses, cloud platforms, and analytics solutions. She has successfully led complex projects, delivering production-ready transactional systems, analytics tools, and machine learning applications.

A collaborative leader, Ruma works effectively with cross-functional and global teams to drive impactful, data-driven projects across various industries. Her focus is on delivering practical, innovative solutions that solve real-world problems.

Education and Continues Learning

Bachelors and Masters in mathematics and computer Science: IIT Kharagpur (1997-2002)

Business Analytics and Business Intelligence: Great Lakes (2017)

AWS Certified Machine Learning Specialty (2021), AWS Certified Data Analytics (2022), AWS Solution Architect Associate (2022), AWS Developer Associate (2022)

Google Cloud Professional Data Engineer (2021), Google Cloud Professional Machine Learning (2022), Google Professional DevOps (2022), Fundamentals-of-Accelerated-Data-Science-with-RAPIDS(2023)

Kaggle Notebook Expert. Sample Notebooks worked on: Text classification, Wine quality classifications, Time series forecasting, Deep learning based modeling for Heart Data, Sentiment analysis of disaster related tweets using advanced NLP, Tensorflow based classification for forest covered types

https://partner.cloudskillsboost.google/public_profiles/ec4272ad-abec-44ac-a739-730287a50bf5

The Complete Guide to Becoming a Software Architect

AI Strategy and Governance Coursera

Algorithms, Technologies, and Platforms

Algorithms: Predictive modeling, Logistics, XGBOOST, ARIMA, LSTM, CNN, RNN, Exponential Weighted moving average, Transformer, BERT, Computer vision, SVM, Bayesian Hyperparameter optimization, OpenCV, Face detection

Advanced Analytics and ML technologies: Tensorflow, Keras, R, Python, Spark, PySpark, Cloud Technology AWS/GCP, BigQueryML, Anaconda, AWS Sagemaker, Jupyterlab, Google Colab, Vertex AI, TFX, KubeFlow, Scikit Learn, Cloud ML APIs, Explainable AI, OpenCV, Face Recognition

Data, ELT/ETL, and Visualization Technologies: Oracle ERP, ETL pipeline design, Query optimization, Oracle Reports, PLSQL, SQL, BigQuery, Bigtable, DynamoDB, Dataflow, Data Fusion, Cloud Functions, Git, Github, Looker, LookML, Bitbucket, Agile/Jira, Confluence, ETL with Glue, Databrew, Athena, Hive, AWS Healthlake, FHIR, Snowflake, Databricks

Domain worked: Supply Chain, Oracle Financials, Purchasing, Order management, AR/AP/PO, General Ledger

Work Experience

Deloitte March 2022 – Current GCP Data and ML Architect

Designed and implemented end-to-end automated data pipelines using Google Cloud Storage, Cloud Functions, and Dataflow standard templates, ensuring seamless data ingestion, transformation, and delivery.

Leveraged Databricks SQL and PySpark in Databricks notebooks for comprehensive data analysis, enabling data-driven decision-making across teams.

Built and deployed an end-to-end Computer Vision model using MLflow on Databricks, streamlining model tracking, experimentation, and deployment workflows.

Developed a Streamlit application integrated with Databricks to visualize and display model predictions, enhancing accessibility and usability for stakeholders.

Designed and implemented a full-stack machine learning pipeline with a focus on MLOps best practices for fraud analytics, leveraging Google Cloud's Vertex AI Pipeline.

Built and fine-tuned Hugging Face BERT models using transfer learning and few-shot learning techniques, deploying them on AWS SageMaker's Bring Your Own Container (BYOC) infrastructure for scalable and efficient processing.

Spearheaded a 10-member team of cloud data analytics and machine learning professionals to design and deliver a cloud-native analytics and insights platform for one of the world’s largest commodity exchanges.

Acted as the lead data modernization and analytics architect, successfully migrating on-premise reference and transactional data warehouses to Google BigQuery on GCP.

Designed and implemented a scalable ELT migration pipeline, transferring data from Cloud SQL to Google Cloud Storage (GCS) and into BigQuery, supporting diverse file formats such as CSV and AVRO using customizable GCP workflows.

Built advanced analytics products leveraging BigQuery, including complex JSON, array, and struct data structures to deliver actionable insights.

Collaborated with cross-functional Agile teams to design, develop, test, deploy, and maintain technical solutions using DevOps best practices. Ensured adherence to industry-leading standards in data security, backup, disaster recovery, business continuity, and data archiving.

Led the design and development of an advanced analytics platform leveraging AWS Healthcare Analytics, Athena, QuickSight, and SageMaker to deliver actionable insights and scalable solutions.

Drove a Proof of Value (PoV) initiative focused on leveraging Generative AI (GenAI) for cross-sell strategies in the insurance domain. Designed end-to-end solutions using AWS and OpenAI to automate the first level of insurance cross-sell cold calls.

Gained extensive experience in the retail sector, designing and training machine learning models for customer segmentation and coupon recommendation systems. Utilized GenAI to create dynamic and personalized marketing content.

Conducted in-depth analysis of the end-to-end data journey, from diverse source systems to Google Cloud as the central data platform, with Oracle EPM as the final destination for data consumption. Mapped and evaluated the system architecture and its components to ensure seamless data flow and integration.

Excel at designing and managing end-to-end data workflows, integrating data from diverse sources such as system-of-record files, databases (Oracle, MySQL), and API-based systems. Following the Medallion architecture, I ensure seamless data ingestion, transformation, and storage in Google BigQuery for scalable and efficient analytics. My expertise lies in building robust data pipelines that deliver reliable data integration and accessibility for downstream consumption.

Executed a Proof of Concept (PoC) for a retail client, focusing on customer segmentation. This involved data and feature selection, cluster analysis, and presentation of insights. Additionally, developed a PoC for generating dynamic, personalized marketing content using Google’s GenAI Text Generation (Text Bison), Python, and LangChain.

Expert in crafting advanced and highly optimized SQL queries in BigQuery, with deep proficiency in window functions, complex data structures (e.g., structs, arrays, JSON), and performance tuning for large-scale datasets.

Designed and architected scalable data products and ETL/ELT pipelines, ensuring robust data integration, transformation, and delivery across enterprise systems.

Extensive experience in data modeling, including multi-dimensional modeling (e.g., star schema, snowflake schema) for data warehousing and analytics platforms, enabling efficient data storage and retrieval.

Proven track record as a seasoned Data Engineer, delivering end-to-end data solutions that align with business objectives, from data ingestion and transformation to visualization and consumption.

Skills used: Python, SQL, Big Data, BigQuery, Vertex AI, Sagemaker, Athena, Glue, Gen AI, LangChain,Prompt Engineering, etc.

Taking sabbatical to investigate new frontier of technology Feb 2020 – Feb 2022

Focused on cloud, data, and AI as the COVID accelerated these adoption.

Focused on key cloud platforms like AWS and GCP

Worked on various use cases, dataset, cloud native technologies to extend capabilities on modern technology.

AWS Machine Learning Certification and GCP data engineering certification

Worked on different Kaggle kernels and competitions.

GetKitch.In Sept 2019 – Feb 2020 Data Science and Machine Learning

A start up to bring social innovation in traditional cookware. Worked as data analyst and machine learning volunteer for this social initiative.

Work on analyzing data to identify potential revenue loss and new customer acquisition

Developing approach to analyze social media data to enhance customer reach and market expansion

Startup Incubation May 2019 – Sept 2019 Data Science and Machine Learning Architect

Smart Global Regulatory Compliance:

Changing regulations creates a dynamic problem for all industries. The platform was conceptualized to leverage power of machine learning to ease compliance cost for enterprise.

NLP pipeline design and implementation for classification and version changes across regulations in financial and insurance industry

System design for regulatory storage, analytics, ML, microservices, real time integration, and dashboarding

Pfizer: March, 2019 – APRIL, 2019 Machine Learning Analyst

Developed proof of concept for application of AI in industrial IoT

HVAC AHU Energy Optimization: I worked on the POC analyzing 3 AHU sensor readings. Visualizations in Python with Seaborn and Matplotlib to view the simultaneous heating and cooling in different months. Using ANN model to predict the SAT in the AHU based on various predictors like OAT/OArh/Cooling Inset Temperature etc.

Predictive Maintenance with Sensor/Tags readings: I worked on different classification models where based on the historical data with sensor readings and failures work order, can we finally build a predictive model where it will predict whether failure will happen in n number of days. Worked with RF, SVM, LSTM, ANN with Python and Keras

Cisco Systems Jan 2011 – Feb 2018 Data Science and ML Analyst

Supply Chain Operations/Transformations

Perform data acquisition, statistical and exploratory analysis in order to support Data Science and Advanced Analytics initiatives that improve products, processes or services related supply chain

Performed POC for mapping relational database into NoSQL(MongoDB) using Python to evaluate performance and scalability for analytics reporting and web applications.

Performed timeseries analysis with ARIMA and segmentation analysis using Kmeans Unsupervised Clustering to determine how different Cisco Products fit into the three categories of Runner/Repeater/Rogue in order to help in the inventory shelf management every quarter. Did extensive Exploratory Data Analysis using R. Helped in efficient inventory management for Cisco quarter by quarter.

Used Descriptive Analytics with R to find hidden patterns/trends in order cancellation using 2 years of Order Dataset. Text Analytics: Expedite and Escalation orders. Root cause the reasons for Expedited orders, Escalations, and Customer Satisfaction

Performed Data Extraction and Exploratory Data Analysis as well as Statistical Analysis with ANOVA to analyze and improve services across different contract manufacturers in terms of supply and demand.

Led and guided development team from vendor partner to ensure consistent quality and timely delivery.

Led Large Scale Service Bundle 1 (Supply Chain) program from initiation to production. Received Chairman award for exceptional business impact and leadership

Mentored and guided new joiners in relevant technical and functional skill to shorten skill gap and time to productivity

Led Informatica Data Lifecycle Management project for from initiation to production

Price Waterhouse Coopers Oct 2006 – Dec 2010 Data Analyst and Architect

Client: (Mc Donalds)

A core technical member of multi year, multi-million oracle financial implementation engagement

Led architecture and development of critical application components

Managed onsite-offshore technical collaboration, new member ramp-up and design review

Spearheaded usage of complex oracle apps features leading to high performance solution

Played a crucial role in onsite client engagement, managing customer functional analyst, and application acceptance testing till go live

Oracle India Jun 2005 – Sep 2006 Data Analyst Consultant

Solution analysis team for identifying source of functional and technical issue in integrated oracle product suit with iStore as primary focus

Owner of complete bug cycle - from bug analysis, solution identification and implementation

Worked with oracle global product suit development team in launching of next version and intermediate patches

Aztec Jan 2005 – Jun 2005 Data Analyst Engineer

Software development center supporting sales force automation for Major Pharmaceutical companies such as Glaxo, Procter and Gambel, Sanofi

Development of Datastage jobs for Extraction, Transformation and Loading data.

Designed scripts for creating tables and indexes and wrote stored procedures, functions, packages and triggers for high performance synchronization

Finch Software Jun 2004 – Dec 2004 Senior Software Engineer

Off shore development center to enhance and support live financial MultiFond application for Net Asset Value derivation

Problem simulation, enhancement of existing application and business logic, Pre-release testing and live support

Established configuration management practices and onsite-offshore communication plan

Mastek Ltd Aug 2002 – Jun 2004 Software Engineer

Developed simple to complex forms, reports, packages, procedures, functions and triggers

Transformed functional requirement to technical design and unit test specifications

Awards and Achievements

Disruptor award for implementing machine learning program in Cisco 2017

You inspire award for leading Informatica life cycle management project in 2015

IT Champion award for Risk Taking 2013

Star performer award 2010

National Talent Search Exam scholarship 1995



Contact this candidate