Siva Narayana Nelluri
Data Engineer Email: ***************@*****.*** LinkedIn Mobile :469-***-****
SUMMARY:
•• 6+ years of experience in designing, building, and maintaining data pipelines on AWS, Azure, and Snowflake, specializing in cloud-native solutions.
• Proficient in Azure Synapse, Databricks, Data Factory, Data Lake, and AWS Redshift, Glue, EMR, ensuring scalable ETL pipelines and data migration.
• Expertise in Python, Bash, and Apache Airflow for automating workflows, along with Kafka, Spark, Hive, and dbt for cloud data transformations.
• Strong experience in dimensional modeling (Star & Snowflake schemas), SQL, and Snowflake for efficient data warehousing and analytics.
• Skilled in CI/CD, DevOps (GitHub, GitLab), PowerShell scripting, and Databricks Medallion Architecture for seamless cloud integrations.
• Hands-on with data extraction from diverse sources (CSV, Parquet, REST API, Salesforce, DB2, Oracle) and optimizing big data workflows.
• Experience with Azure Databricks Medallion Architecture with Delta Live Tables and Spark SQL for efficient data processing.
• Developed and optimized SSIS packages, stored procedures, functions, triggers, and views for enterprise database management.
• Adept at technical documentation, project execution, stakeholder communication, and delivering insights with Google Data Studio.
• Strong analytical, problem-solving, and interpersonal skills, with the ability to collaborate effectively and drive innovative data solutions.
TECHNICAL SKILLS:
•Programming Languages: Python, SQL, Java,scala
•Big Data Technologies: Apache Hadoop, Spark, Databricks, Hive scripting, Apache Kafka
•Databases: MySQL, MongoDB, Cassandra, HBase, NOSQL
•Data Warehousing: Amazon Redshift, Google BigQuery, Snowflake, Postgres, Synapse
•ETL/ELT Tools: Databricks, HDInsight, Azure Dataflow, Informatica, Snowflake, Kafka
•Data Modeling: Star Schema, Snowflake Schema
•Cloud Technologies: AWS (S3, EMR, RDS, Redshift), Azure (Azure Data Lake, Azure SQL Database)
•Data Visualization: Tableau, Power BI, AWS QuikSight
•Containerization and Orchestration: Docker, Kubernetes
•Data Security: Encryption, Data Masking, Compliance with GDPR/CCPA
•Performance Optimization: Indexing, query optimization, caching.
PROFESSIONAL EXPERIENCE:
American Airlines
Senior Data Analyst 02/23 -Present
•Designed and implemented end-to-end data solutions (storage, integration, processing, visualization) in Azure. Extracted Transformed and Loaded data from sources systems to Azure Data storage services using a combination of Azure Data factory (ADF), T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics.
•Data Ingestion to one or more Azure Services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processed the data in Azure Databricks.
•Developed ETL process to transform the data from different systems using the Azure Data Factory (ADF) pipelines and used Data Lakeprocess in-order to perform data massaging or conversions.
•Developed Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
•Created Spark clusters and configured high concurrency clusters using Azure Databricksto speed up the preparation of high-quality data. Moved data between on-Premises Cluster and Azure using Azure Data Factory(ADF).
• Performed migrating SQL Database to Azure data Lake, Azure data lake, Analytics, Azure SQL Database, DataBricksand Azure SQL Data warehouse.Led migration projects to move on-premises data warehouses to Snowflake on Azure.
•Worked with Hive data warehouse infrastructure-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HQL queries.
•Developed Python scripts to do file validations in Databricks and automated the process using ADF and developed JSON Scripts for deploying the pipeline in Azure Data Factory (ADF) that processes the data using the Cosmos activity.
•Developed in creating reports and user friendly visualizations to give business users a complete understanding of business by using POWER BI and Tableau.
•Proficient in DAX and M- language for transforming data using power query editor and making relationship between table using star schema.
CVS Health
Data Engineer (Offshore) 01/2020 – 12/2021
•Worked on developing, debugging, testing, deploying and maintaining the data pipelines along with third-party jars, packages in all environments.
•Develop high-level and detailed solution architectures for data engineering projects, ensuring alignment with business objectives and scalability.
•Design solutions using big data technologies like PySpark over Amazon EMR, AWS EKS, AWS Glue, Pandas dataframe over AWS Lambda, DynamoDB, DMS and others. Real-time data processing using services like Amazon Kinesis, Apache Kafka, Apache Flink.
•Design and optimize data warehouses using services like Amazon Redshift and implement strategies to optimize query performance for large datasets
•Design and implement data models that meet business needs and ensure efficient data storage and retrieval
•Set up monitoring and logging for data pipelines and warehouses. Diagnose and resolve issues in data pipelines and architectures
•Effectively communicate technical decisions, challenges, and solutions to both technical and non-technical stakeholders
•Led the design and implementation of complex ETL processes using SQL Server Integration Services (SSIS), optimizing data flow and transformation logic to enhance performance and reliability.
•Extensive hands-on experience leveraging PySpark, Spark with Scala and Java on Amazon EMR, Hadoop with Java for large-scale data processing, ensuring efficient ETL workflows and optimal resource utilization. Data ingestion using Apache Sqoop, Python JDBC application, Spark connectors, Fivetran, TDD
•Proficient in designing and implementing data pipelines on AWS, utilizing services such as Amazon S3 for scalable and cost-effective storage, AWS Step Functions for orchestrating workflows, and Amazon Redshift for high-performance data warehousing, Databricks, Elasticsearch, Kibana
Best Buy
Data Engineer (Offshore) 06/2018 – 01/2020
Technologies: Java, Kafka, Apache Flink, Google BigQuery, Talend, Dask, Kubernetes, Docker, Power BI, ELK Stack, Google Cloud Storage, Apache Airflow, Apache Beam, HL7, FHIR, Apache Spark, HIPAA, PostgreSQL
•Optimized data pipelines and processing by reducing runtime by 30% with Python script enhancements and leveraging Apache Spark for distributed in-memory computations, achieving a 40% improvement in data processing efficiency.
•Implemented real-time data streaming solutions using Amazon Kinesis and Kafka, enabling seamless data ingestion, high-velocity data analysis, and near-instant insights for analytics applications.
•Enhanced query and system integration performance by tuning Apache Hive data structures to reduce query execution time by 50% and streamlining Java-based system integration for seamless interaction with modern infrastructure.
•Enhanced data processing and analytics by executing ad-hoc queries on massive datasets with Google BigQuery, implementing real-time analytics workflows with Google Cloud Dataflow, and managing a Snowflake data warehouse for scalable storage and improved query performance.
•Automated and optimized ETL workflows for data validation and cleansing, Data Vault modeling for scalable enterprise data architecture, and cloud-native solutions like AWS S3, EMR, and RDS for efficient processing and sto
•Improved data accessibility and scalability by managing Azure Data Lake and Azure SQL Database solutions, enabling seamless storage and retrieval for global business intelligence teams.
•Delivered actionable business insights by developing interactive Tableau dashboards and custom visualizations in Power BI, enabling real-time analytics and enhancing decision-making for stakeholders.
•Streamlined large-scale data and application workflows by orchestrating containerized applications with Kubernetes for scalability and resilience, and automating CI/CD pipelines with Jenkins and GitLab CI to reduce deployment errors.
•Enhanced database performance by optimizing queries and indexing, reducing query execution time by 35%.
EDUCATION -- Master’s in computers and information science (Southern Arkansas University (Arkansas)- 2023)