Data Engineer Processing

Location:

Kansas City, MO

Salary:

70000

Posted:

July 07, 2024

Contact this candidate

Resume:

SREESANTHREDDY SAMALA

linkedin.com/in/sreesanth-s-9a65b42a4 +1-816-***-**** ad61qu@r.postjobfree.com Kansas City, MO 64111

Professional Summary

Seasoned Data Engineer with 3+ years of experience excelling in optimizing data storage mechanisms and contributing across diverse Data Engineering domains, including Data Pipeline Design, ETL Orchestration, Profiling, Advanced Analysis, Integration, and Governance. Proficient in data modeling, schema design, and performance optimization, leveraging cloud platforms (AWS, Azure) to deliver scalable and resilient data solutions. Demonstrated expertise in SQL, encompassing Joins, Aggregation, Windowing functions, and relational databases (Oracle, MySQL, PostgreSQL) for modeling, querying, and administration tasks. Proficient in Python and Spark for automating data processing workflows and enhancing efficiency in data engineering processes. Strong analytical, problem-solving, communication, and ensuring successful project execution and fostering effective collaboration in cross-functional team environments.

Skills

Big Data Tools: Hadoop Ecosystem: Map Reduce, Spark, Airflow, HBase, Hive, Pig, Kafka, Hadoop.

Programming Languages: Python, R, Java, SQL, and Scala.

Methodologies: System Development Life Cycle (SDLC), Agile.

Cloud Platform: AWS, Azure, GCP, Snowflake

Data Visualization Tools: Power BI, Tableau, MS Excel.

Packages: Pyspark, NumPy, Pandas, Matplotlib, Seaborn, SciPy and Git.

Databases: MySQL, SQL Server, Azure SQL DB, Dynamo DB, Oracle.

ETL/Data warehouse Tools: Informatica PowerCenter, SSIS, Apache NiFi, Talend.

Technical Competencies: Data Governance, Data Security, API development, Problem-Solving and Performance Tuning.

Work History

Data Engineer Jan 2024 - Present

Molina HealthCare, Missouri

Spearheaded the strategic migration of healthcare data and applications to AWS cloud platforms, achieving enhanced performance, substantial cost savings, and robust disaster recovery and high availability solutions.

Optimized data processing by Constructing efficient ETL pipelines (AWS Glue and Azure ADF) and streamlining database design.

Orchestrated the design and implementation of a hybrid cloud system utilizing AWS and Azure, enhancing data processing efficiency by 35% and ensuring flawless data flow between AWS Glue and Azure Synapse Analytics for processing the data.

Worked on PySpark and Spark SQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations, alongside the implementation of AWS IAM policies and security best practices to prevent unauthorized access.

Developed and deployed custom Kafka Connect connectors, integrating with 10+ data sources and third-party APIs, resulting in a 50% reduction in data integration time and enabling real-time analytics beyond traditional RDBMS systems

Wrote customized Spark transformations and actions in Python, augmenting Spark's capabilities to manage intricate data processing tasks in Databricks, resulting in a 50% increase in data throughput.

Crafted advanced SQL stored procedures for OLTP database solutions on Microsoft SQL Server, ensuring optimized data processing and management efficiency by 45% increase in improved operational performance.

Established CloudWatch, Airflow for monitoring, and addressing performance issues to maintain a notable 20% increase in resource uptime.

Utilized Jira and Confluence for project management, leading to a 30% improvement in project delivery time and facilitating effective collaboration with a 95% completion rate.

Awarded Employee of the Month for exemplary performance in optimizing data workflows, leading to a 40% reduction in processing times and a 20% increase in overall team productivity.

Data Analyst Apr 2020 – Jul 2022

Trigent Software, India

Engineered the implementation of a scalable data pipeline that reduced data processing time by 50% and improved data accuracy by 30%, resulting in more efficient data-driven decision-making for the organization.

Extensively leveraged Azure Databricks and Data Factory as ETL platforms, ensuring effective data processing, transformation resulting in a saving an average of 15 hours per week.

Shaped impactful data visualizations using Tableau and Power BI translating complex data into clear and actionable insights, increasing stakeholder decision-making efficiency by 30% and reducing report generation time by 25%.

Integrated Azure Synapse Analytics with Azure Data Factory, Azure Databricks, and Azure Machine Learning to develop end-to-end data processing and analysis workflows, enhancing decision-making processes through accurate data analysis and visualization techniques.

Automated data extraction jobs from multiple data sources, including CSV, JSON, Excel, Oracle, SQL Server seamlessly pushing result sets to cloud storage solutions like Azure Blob Storage and targeted systems reducing manual effort by 60% and data delivery time by 50%.

Envisioned data-driven decision-making by 35% through developing interactive reports and dashboards in Power BI to visualize KPIs and trends and improved report generation speed by 25% by optimizing Power BI reports for performance.

Expertise in writing complex SQL queries, stored procedures, functions, and triggers, and other database platforms to manipulate and analyze large datasets efficiently by 95% performance thorough cleaning, validation, and transformation tasks.

Implemented robust data encryption and access controls to safeguard sensitive risk management data, leveraging Azure Key Vault and Snowflake's advanced security features enhancing data security by 50% and compliance by 20%.

Integrated SSIS with SQL Server ecosystem components, such as SQL Server Database Engine, SQL Server Analysis Services (SSAS), and SQL Server Reporting Services (SSRS), creating end-to-end data solutions to support business intelligence and reporting needs.

Accelerated the deployment of a NoSQL database cluster, adeptly accommodating a 200% surge in data volume, including unstructured data. thereby enhancing the overall data infrastructure's robustness and performance.

Demonstrated Advanced the development of robust data pipelines by leveraging Python, pandas, and Java, and constructed both batch and streaming workflows using the PySpark framework to ensure scalable increasing data processing efficiency by 40% and analysis

Networked within Agile frameworks, actively engaging in sprint planning, daily stand-ups, and retrospectives to ensure timely project delivery, continuous improvement, and alignment with business objectives.

Received Star Performer Award and Honored with Excellence in Data Engineering Award for developing a data visualization dashboard that reduced reporting time by 30% and 40% reduction in processing times and a 20% increase in overall team productivity.

Education

University of Missouri Kansas City Kansas City, Missouri

Master of Science: Computer Science with Data Science [GPA:3.9/4] Aug 2022 - Dec 2023

Sri Venkateshwara University India

Bachelor of Technology: Computer Science and Engineering [GPA:3.5/4] Jul 2017 – Jun 2021

Certification

Microsoft Certified: Azure Data Engineer Associate, Certified in AWS, Azure, GCP Databricks Platform Architect, Databricks Lakehouse Fundamentals, Agile Project Management (by Atlassian, LinkedIn), Data Analysis (Microsoft/LinkedIn), GitHub Professional (by GitHub), Simplifying Data Pipelines with Apache Kafka (by IBM).

Contact this candidate