Real-Time Data Engineer

Location:

Hamden, CT

Salary:

95000

Posted:

March 31, 2025

Contact this candidate

Resume:

SUMMARY

Ravi Divya Sree

New Haven, CT +1-203-***-**** **********@*****.*** www.linkedin.com/in/rdivya21

● Over 5 years of experience architecting and deploying scalable, data solutions across AWS, Azure, and GCP leveraging Apache Spark, Hadoop, Kafka, and Databricks to drive enterprise-scale analytics and real-time processing.

● Expert in developing ETL/ELT pipelines using Apache Airflow, AWS Glue, Azure Data Factory, integrating relational databases and NoSQL to real-time streaming data to fuel predictive modeling, regulatory reporting, business intelligence.

● Proficient in implementing IaC with Terraform, AWS CloudFormation, and ARM, with CI/CD best practices using Jenkins and container orchestration via Docker and Kubernetes, ensuring rapid, repeatable, and secure cloud deployments.

● Specialized in data warehousing modeling, designing Star and Snowflake schemas for OLAP/OLTP environments and optimizing complex SQL queries to support risk management, financial reporting, and making strategic decisions.

● Deep expertise in MLOps and AI integration, building machine learning pipelines with Python, MLflow, and XGBoost to enhance fraud detection, risk assessment, and operational efficiency across financial and healthcare sectors.

● Collaborative leader with expertise in Agile, Scrum, and Waterfall, working cross-functionally with data scientists, and business stakeholders to deliver data-driven solutions that cut costs, improve performance, and drive business value WORK EXPERIENCE

Webster Financial Corporation Stamford, CT

Sr. Data Engineer Sep 2023 - Present

● Developed and managed data pipelines integrating InterLINK Deposit Management Platform with Webster’s infrastructure Azure Data Factory, Databricks, and Snowflake, ensuring seamless data flow for deposit management.

● Engineered data migration strategies for the Sterling Bancorp merger, synchronizing disparate data sources across Azure Synapse and Snowflake, maintaining data integrity and improving operational efficiency.

● Designed scalable ETL pipelines using Azure Data Factory, Databricks, and SQL, automating data transformation from Oracle, SQL, and Cosmos DB, reducing processing latency by 40%.

● Developed AI-driven financial models in collaboration with AI engineers, integrating Kafka, Snowflake, and MLflow, supporting predictive analytics for deposit trends, fraud detection, and risk management.

● Implemented real-time streaming analytics using Kafka, Azure Event Hubs, and Stream Analytics, enabling low-latency fraud detection and improving risk assessment capabilities.

● Optimized ETL workflows in Oracle within Azure, leveraging PL/SQL tuning, partitioning, and indexing, ensuring seamless integration with Snowflake and Synapse Analytics.

● Developed Databricks ETL pipelines using PySpark, Spark DataFrames, and SQL, improving batch processing efficiency by 35% for loan and deposit risk assessment models.

● Orchestrated CI/CD workflows using Jenkins, OneOps Cloud, and Terraform, automating Azure-based data pipeline deployments, reducing manual intervention by 45%.

● Integrated Salesforce CRM with Snowflake using Airbyte and Kafka, enabling real-time data ingestion and transformation, improving customer insights and financial forecasting accuracy.

● Developed interactive dashboards in Power BI and Tableau, visualizing AI model outputs for deposit trends, fraud patterns, and transaction anomalies, enhancing data-driven decision-making.

● Enhanced risk management frameworks by implementing real-time data analytics pipelines, ensuring accurate identification and mitigation of financial risks, improving regulatory compliance and fraud prevention strategies. Hartford Financial Services Group Inc Hartford, CT AWS Data Engineer Jan 2023 - Aug 2023

● Developed scalable data infrastructure using AWS Glue, Redshift, and S3, optimizing long-duration insurance contract

(LDTI) compliance workflows, ensuring accurate financial reporting and reserve calculations.

● Built automated ETL pipelines in AWS Glue and Lambda, integrating actuarial data for risk assessment models, improving data accessibility and transformation efficiency.

● Implemented real-time data streaming with AWS Kinesis and AWS Lambda, enabling fraud detection and claims processing analytics, reducing data ingestion latency.

● Designed Spark-based data transformations in AWS EMR and PySpark, streamlining actuarial data processing for premium forecasting and risk modeling, improving data processing speed.

● Integrated Hive and Kafka streaming to process high-volume claims and underwriting data, ensuring low-latency decision-making for risk management teams.

● Migrated legacy databases from Oracle BDW and Teradata to Hadoop HDFS Sqoop, for data warehousing scalability.

● Provisioned AWS using Terraform and CloudFormation, optimizing cost and efficiency for cloud-based actuarial models.

● Implemented CI/CD pipelines with AWS Code Pipeline, automating data deployment workflows, reducing manual intervention in risk analytics reporting.

● Integrated Kubernetes with EKS, enhancing scalability for ML models in fraud detection and underwriting automation.

● Developed Kibana dashboards for real-time claims monitoring, integrating Logstash and Elasticsearch, ensuring end- to-end transaction visibility.

● Optimized AWS EC2 and Redshift clusters, reducing operational costs while maintaining high availability for actuarial data processing pipelines.

Biocon (Capgemini) Bangalore, India

Data Engineer May 2021 – Jul 2022

● Designed scalable data architectures on GCP for pharmaceutical manufacturing analytics, integrating BigQuery, Dataproc, and Cloud Storage, optimizing biologics production monitoring and ESG reporting workflows.

● Developed ETL pipelines in Apache Airflow on GCP, automating clinical trial data processing, enabling real-time tracking of drug efficacy and regulatory compliance reporting.

● Migrated on-premises Hadoop and Oracle databases to Google Cloud (BigQuery, Dataproc, Data Flow), reducing query latency by 40% and improving analytics performance for biopharmaceutical R&D.

● Built real-time data streaming solutions using Kafka, GCP Pub/Sub, and AWS Kinesis, improving real-time supply chain monitoring and reducing inventory forecasting errors by 35%.

● Optimized batch and streaming data workflows using PySpark, Hive, and GCP EMR clusters, increasing data processing speed by 50% for drug development and quality assurance analytics.

● Developed Power BI dashboards connected to BigQuery and Snowflake, providing real-time insights into biologics production efficiency and ESG impact.

● Leveraged GCP Cloud Functions, Google Cloud SDKs, and Google Container Builder, automating machine learning pipelines for predictive healthcare analytics and risk assessment models.

● Implemented data lake solutions using AWS S3, Glacier, and GCP Cloud Storage, ensuring secure, scalable data storage for regulatory audits and patient safety tracking.

● Enhanced cloud infrastructure using Terraform and Kubernetes enabling data pipelines for real-time pharma analytics. GE India Industrial Pvt. Ltd, Bangalore, India

Jr. Data Engineer Jun 2019 – Mar 2021

● Developed JSON scripts to deploy Azure Data Factory (ADF) pipelines, processing renewable energy and aviation R&D data using SQL Activity and UNIX shell scripts, enabling parallel execution for real-time analytics.

● Designed and deployed data pipelines using Azure Data Lake, Databricks, and Apache Airflow, optimizing sustainable aviation fuel R&D data processing, reducing ETL execution time by 40%.

● Implemented Kafka streaming to process real-time wind energy sensor data, integrating Spark Streaming for analytics, reducing data processing latency by 35% for predictive maintenance models.

● Orchestrated end-to-end pipelines using Control-M and AWS Simple Workflow, automating data ingestion and transformation for industrial manufacturing analytics, ensuring seamless data movement across cloud infrastructure.

● Deployed Kubernetes clusters with OpenShift and Docker, optimizing containerized Spark applications, improving biopharmaceutical manufacturing data processing by 50%.

● Developed and optimized Slow Changing Dimension (SCD) Type 2 tables, creating complex stored procedures, triggers, and SQL joins, ensuring accurate tracking of renewable energy asset changes over time.

● Created Databricks Spark jobs using PySpark and Spark SQL, executing table-to-table transformations on energy consumption datasets, improving query performance for sustainability impact analysis.

● Implemented Infrastructure as Code (IaC) using Terraform and AWS CloudFormation, automating cloud resource provisioning and reducing manual infrastructure management by 60%. SKILLS

Programming: Python, PySpark, Scala, SQL, Shell Scripting (UNIX, Bash), JSON, Bash Big Data Technologies: Apache Spark, Hadoop (HDFS, Hive, HBase, MapReduce), Kafka, Databricks, Apache Airflow ETL & Data Engineering: AWS Glue, Azure Data Factory, Snowflake, Redshift, Synapse, SQL, SSIS, Informatica, Talend Cloud Platforms: AWS (S3, Lambda, Kinesis, Redshift, EMR, EKS), Azure (Data Lakes, Synapse, Data Lake, Event Hubs, Kubernetes), GCP (BigQuery, DataFlow, Dataproc, Pub/Sub, Cloud Storage) DevOps: Terraform, CloudFormation, Jenkins, Git, Bitbucket, Kubernetes (AWS EKS, GCP GKE, OpenShift), Docker Streaming & Real-time Processing: Kafka, Kinesis, Pub/Sub, Azure Event Hubs, Spark Streaming, Stream Analytics Data Warehousing: Snowflake, Synapse, Oracle, SQL Server, MySQL, Teradata, Star & Snowflake Schema, OLAP/OLTP BI & Visualization: Power BI, Tableau, Kibana, AWS QuickSight Machine Learning & MLOps: MLflow, XGBoost, Fraud Detection, Risk Assessment, Tensorflow, Scikitlearn EDUCATION

University of New Haven West Haven, CT

MS in Computer Science Aug 2022 – Dec 2023

Velagapudi Rama Krishna Siddhartha Engineering Vijayawada, India BS in Computer Science Jun 2016 – May 2020

Contact this candidate