senior Data Analyst

Location:

Kansas City, MO

Salary:

70$/hr

Posted:

September 19, 2024

Contact this candidate

Resume:

NEHA GATLA

SENIOR DATA ANALYST

PROFESSIONAL SUMMARY:

Having 9+ of Strong experience as a Data Analyst with expertise in Data Analysis, Data Migration, Data Validation, Data Cleansing, Transformation, Integration, Data Import and Data Export.

Extensive experience in Informatica PowerCenter for designing and implementing ETL processes, including data extraction, transformation, and loading from various sources to target systems, enhancing data integration efficiency.

Solid working experience with SQL, including MySQL, MS SQL Server, Teradata, and Snowflake for complex querying and data warehousing.

Experienced in Hadoop ecosystem components like Hadoop MapReduce, HDFS, Hive, Sqoop, Pig, and Flume.

Expert in Python programming, utilizing libraries such as Pandas, NumPy, and Scikit-learn for data manipulation, analysis, machine learning, and problem-solving.

Expertise in integrating AWS services like AWS S3, AWS EMR, EC2, AWS RDS, and AWS Redshift to build comprehensive data pipelines for analytics and reporting.

Skilled in Azure Data Bricks, Azure Data Lake, Azure SQL, Azure Blob Storage, Azure Synapse Analytics, and Azure DevOps for advanced analytics and machine learning.

Adept in conducting qualitative analysis (QA), interpreting non-numeric data through thematic coding, pattern recognition, and narrative reporting using Python, SQL, and data visualization tools like Power BI and Tableau.

Proficient in Tableau for creating custom charts and interactive dashboards to meet specific business requirements.

Proficient in Power BI, configuring data refresh schedules, managing dataset updates, and creating insightful reports for business intelligence.

Experienced with SSRS for creating, deploying, and managing reports.

Adept in writing and optimizing SAS code for data manipulation, analysis, and reporting, leveraging SAS Base and SAS Macro languages.

Skilled in utilizing SAS Access and SAS Stat for managing, cleaning, and pre-processing large datasets to ensure high data quality.

Expert in data cleaning and pre-processing tasks using Python, handling missing values, outliers, and inconsistent data.

Proficient in integrating data from diverse sources into QlikView applications for comprehensive analysis.

Experienced with SSIS for implementing and configuring ETL processes, enhancing data flow and transformation.

Competent in configuring event tracking in Google Analytics to measure user interactions with website elements and content.

Skilled in utilizing HTTP protocols within data integration and API communication.

Experienced in risk management practices, including identifying, assessing, and mitigating data-related risks to ensure data integrity and compliance.

Expertise in quality assurance practices, ensuring data accuracy, consistency, and completeness.

Competent in statistical analysis using software such as SAS, R, and Python for complex data analysis and quantitative modeling.

Proficient in using Jupyter Notebooks for data analysis, visualization, and interactive computing, enhancing collaboration and reproducibility of results.

Exhibit strong project management skills, leading cross-functional teams, managing timelines, and delivering data-driven solutions.

Demonstrate excellent communication skills, articulating complex data insights to stakeholders and facilitating actionable discussions.

Show proven problem-solving abilities, addressing data-related challenges through innovative solutions and

strategic thinking.

TECHNICAL SKILLS:

Programming Languages: Python (Pandas, NumPy, Scikit-learn, Beautiful Soup, Scrapy), R (ggplot2, Shiny,

arules), SQL.

Big Data & Hadoop: Hadoop Map Reduce, HDFS, Hive, Sqoop, Pig, Flume, HBase, Apache NiFi

Databases: MySQL, MS SQL Server, PostgreSQL, AWS RDS, Azure SQL Database, Azure

Cosmos DB, HBase, Cassandra, Snowflake, Teradata

Data Integration & ETL: Apache Sqoop, Informatica PowerCenter, AWS Glue, SSIS, Azure Data Factory, Data

Bricks.

Cloud Platforms: AWS (EC2, S3, RDS, Redshift, EMR, Lambda, Cloud Watch), Azure (Data Lake,

Synapse, Blob Storage, Azure DevOps)

Data Visualization: Power BI, QlikView, Tableau, agile

Analytics & Reporting: Google Analytics, SAS Stat, SAS Access, Jupyter Notebook, R Markdown, Shiny.

Data Modelling: SAS Base, SAS Macros, MySQL, Power BI, SQL (Stored Procedures, Functions)

Soft skills: Written and Verbal Communication Skills, Presentation, Problem Solving, Technical

Writing, Self-Motivated, Project Management, Agile Methodology

PROFESSIONAL EXPERIENCE:

Client: Kansas Department of Education, Topeka, KS Jul 2022 – Present

Role: Sr. Data Analyst

Responsibilities:

Manipulated data and calculated key metrics using MySQL queries (window functions, subqueries) and MS Excel, optimizing Hadoop components like Edge nodes, HDFS directories, and Hive tables to support project-specific access requests and strategic planning.

Developed and managed ETL processes and data pipelines for MySQL databases, integrating NoSQL databases (Azure Cosmos DB) with SQL databases (MySQL, Oracle RDBMS) and big data technologies (Hadoop, Azure Data Bricks), and implemented Informatica workflows for automated data migration and transformation, ensuring seamless data flow and integrity between various systems.

Developed ETL processes using to integrate Postgres SQL with Azure Data Lake and Data Bricks, optimizing data management and analytics across SQL and NoSQL databases.

Created detailed data models and logical/physical schemas for MySQL databases and integrated Oracle RDBMS with Azure Data Lake and Data Bricks for advanced analytics.

Developed scripts for data export from Hadoop to relational databases using Apache Sqoop and implemented data integration solutions within Azure Synapse Analytics and Azure Stream Analytics.

Utilized Python’s Scikit-learn to build and evaluate predictive models, enabling forecasting of business metrics, and employed Beautiful Soup and Scrapy for web scraping, enriching datasets, and enhancing analysis.

Performed statistical analysis on large datasets using R, SAS, and Python, uncovering trends, correlations, and patterns that drove data-driven decision-making.

Integrated data from various sources using SAS DATA step and PROC SQL and developed interactive dashboards in Tableau to visualize KPIs and trends, creating a data roadmap for strategic decision-making.

Implemented advanced Tableau features such as calculated fields, parameters, and dynamic dashboards, and developed Tableau workbooks that combined data from multiple sources (e.g., MySQL, Azure Data Lake).

Created Tableau storyboards to present complex data analyses in a narrative format, facilitating better understanding and communicated insights to stakeholders.

Designed and maintained data warehouse schemas for efficient storage and retrieval and utilized Azure Data Explorer for real-time analytics and querying of data stored in Azure Data Lake.

Facilitated workshops and meetings to discuss data insights, gather requirements, and align analytical objectives with business users, collaborating with cross-functional teams to manage projects from inception to completion.

Extracted, transformed, and analyzed data from SAP systems to support business intelligence and reporting needs, and developed recommendations to improve business processes and operational efficiency.

Extracted, transformed, and loaded (ETL) data from SAP HANA systems, integrating it with Azure Synapse Analytics and Hadoop ecosystems to streamline data processing, enhance reporting capabilities, and provide real-time insights for strategic decision-making.

Created and manipulated pivot tables in MS Excel for summarizing, analyzing, and visualizing large datasets, and integrated data from multiple worksheets and workbooks for comprehensive analysis.

Mapped and analyzed user journeys through websites and apps using Google Analytics, identifying friction points and optimizing the user experience.

Developed interactive data widgets in Jupyter Notebooks using ipywidgets and employed C programming to develop custom data analysis scripts and tools for automating tasks and improving efficiency.

Environment: Azure (Cosmos DB, Oracle RDBMS, Data Bricks, Oracle HCM Fusion, Synapse Analytics, Stream Analytics, Data Explorer), MySQL, MS Excel, Python, Scikit-learn, Hadoop, Apache Sqoop, SAS DATA step, SAS PROC SQL, Tableau, Beautiful Soup, Scrapy, R (arules, data. Table), SAP, SAP HANA, Google Analytics, Jupyter Notebooks, SAS Graphics, SAS Visual Analytics.

Client: GoodRx, Santa Monica, CA Mar 2020 – Jun 2022 Role: Sr. Data Analyst

Responsibilities:

Designed and maintained MySQL databases, created pipelines using user-defined functions and stored procedures, and developed ETL processes to integrate data from various sources into the data warehouse.

Utilized Teradata to design and implement data warehousing solutions, optimizing complex queries and data integration processes within an AWS environment (EC2, Cloud Watch, Lambda, Redshift, IAM, S3).

Integrated Salesforce data with MySQL and AWS data warehouses, utilizing Salesforce APIs for seamless data synchronization and enhancing CRM analytics and reporting capabilities.

Automated data validation, cleaning, and deduplication using Python scripts with Pandas, Apache Airflow, and NumPy, ensuring data consistency and integrity.

Queried and validated data from MySQL databases and utilized AWS Athena for ad-hoc querying and analysis of large datasets in S3, optimizing query performance.

Tested and validated APIs for data integration, ensuring seamless communication between different systems, and maintained comprehensive documentation to facilitate troubleshooting and future development.

Utilized Apache HBase for real-time data storage and retrieval in Hadoop and established automated Hadoop Integration testing systems with Oozie workflows.

Implemented Terraform for automated infrastructure management on AWS, including EC2, S3, and RDS, to ensure consistent and scalable data pipeline deployments.

Enabled logging and monitoring by creating CloudWatch log groups, granting AWS EC2 instance access via IAM roles, and configuring AWS S3 event notifications to trigger AWS Lambda functions.

Designed and implemented interactive Power BI dashboards and reports that visualized complex data from MySQL databases, AWS S3, and Redshift, providing actionable insights and facilitating data-driven decision-making.

Leveraged Power BI's DAX for advanced calculations and Power Query for data transformation to enhance reporting accuracy and user engagement.

Utilized Alteryx to automate complex ETL workflows, integrating data from AWS S3, Redshift, and MySQL, enhancing data processing efficiency.

Implemented and managed SaaS based data integration tools to streamline ETL processes and integrate data from MySQL, AWS S3, and Redshift into the data warehouse, enhancing scalability and real-time data synchronization.

Developed and maintained complex Excel formulas, PivotTables, Pivot Charts, and financial models for data analysis, financial forecasting, budgeting, and scenario analysis.

Conducted comprehensive data analysis using statistical tools such as R, Scikit-learn, and Statsmodels, including hypothesis testing, regression analysis, and predictive modeling.

Configured and optimized SAP modules to enhance data accessibility, improve analytical capabilities, and support business intelligence and strategic decision-making.

Led Agile data analytics projects leveraging AWS (EC2, Lambda, Redshift, S3) and MySQL, integrating data pipelines with Python, Pandas, and Apache Airflow. Streamlined workflows using Hadoop, Oozie, and Apache HBase, and delivered predictive models with Scikit-learn and Statsmodels.

Collaborate with data engineers, data scientists, and business analysts to ensure data integrity, accuracy, and accessibility, emphasizing strong teamwork and communication.

Created and managed detailed reports using SAS procedures, SAS Enterprise Guide, and Jupyter Notebooks, presenting complex data findings clearly to stakeholders.

Analyzed behavior flow reports in Google Analytics and evaluated marketing campaign performance to guide strategic decisions and optimize site architecture.

Connected to MySQL databases using R package RMySQL and retrieved data for statistical analysis and visualization.

Environment: AWS (EC2, Cloud Watch, Lambda, Redshift, IAM, S3), MySQL, Python, Pandas, Apache Airflow, NumPy, Hadoop, Oozie, Power BI, SAS Access, SAS Stat, SaaS, R, Excel, Apache HBase, S3, Scikit-learn, Stats models, Jupyter Notebooks, Google Analytics, SAS Enterprise Guide.

Client: TATA AIG General Insurance Company Limited, Mumbai, India Oct 2016 – Jan 2020

Role: Data Analyst

Responsibilities:

Led project and program management for large-scale data initiatives, including the integration of Azure Data Factory with various data sources, implementing data lifecycle management policies, and developing scalable Snowflake data models. Managed cross-functional teams, aligning information system requirements and using Jira for task management.

Optimized Hadoop components (Edge nodes, HDFS, Hive) and utilized PySpark for large-scale data processing, supporting strategic planning and advanced analytics.

Developed Python scripts for data pre-processing in predictive models, including missing value imputation, label encoding, and feature engineering, leveraging Pandas for data manipulation and outlier detection.

Integrated Azure Data Factory (ADF) with Azure Blob Storage, Azure SQL Database, and on-premises data sources, enabling comprehensive data workflows and seamless data transfer.

Implemented data lifecycle management policies within Azure Data Lake and Azure Databricks, using Visual Studio for coding and automation to ensure compliance with data governance policies.

Monitored Hadoop cluster job performance, performed capacity planning, and managed nodes, using SQL for querying and analysis on various source tables.

Collaborated with cross-functional teams to address ad-hoc requests, utilizing data from Azure Data Lake and Snowflake to provide comprehensive insights.

Designed and implemented scalable Snowflake data models, integrating with SQL Database and SQL Server to support advanced analytics and reporting.

Developed and maintained QlikView dashboards, utilizing advanced scripting and data modelling techniques to ensure seamless integration with SQL Server, Snowflake, and Azure SQL Database, delivering comprehensive and interactive visual analytics to stakeholders.

Generated enterprise reports using SSRS from SQL Server Database (OLTP) and SQL Server Analysis Services, ensuring accurate and reliable reporting.

Created advanced visualizations in R using ggplot2 and lattice to effectively communicate data insights and trends to stakeholders.

Automated repetitive data processing tasks and reporting using SAS macros and scheduling tools, enhancing efficiency and consistency in reporting processes.

Designed and implemented custom data ingestion frameworks using Apache NiFi, automating data flow into the Hadoop ecosystem from various sources.

Developed C# applications and scripts to automate data validation, reporting, and ETL processes, integrating with Azure services like Data Factory, SQL Database, and Data Lake to reduce manual effort.

Applied HTTP methodologies in Jupyter Notebooks to access and analyze real-time data from web APIs, developing interactive data storytelling experiences with Jupyter Notebook widgets and visualizations.

Developed and applied advanced audience segments in Google Analytics using Azure DevOps, enhancing the efficiency of targeting specific user groups and improving marketing efforts.

Environment: Azure (Data Factory (ADF), Blob Storage, SQL Database, Data Lake, Data Bricks), Azure DevOps, QlikView, Python, Pandas, Hadoop, SQL, R, ggplot2, lattice, SAS Base, SAS Stat, SAS Access, SSRS, SQL Server, Snowflake, Apache NiFi, Excel, SAS Macros, VBA, Jupyter Notebook, Google Analytics.

Client: Indiabulls Housing Finance Ltd. (IBHFL), Mumbai, India Jul 2014 – Sep 2016

Role: Data Analyst

Responsibilities:

Created interactive Power BI dashboards and reports that provide actionable insights into business trends and key

performance indicators (KPIs).

Worked on analysing Hadoop cluster and different big data analytic tools such as Hive QL. Involved in

performance tuning and monitoring of both T-SQL and PL/SQL blocks.

Built and implemented data pipelines and reports using Git branching techniques, combining data from relational

databases (T-SQL, PL/SQL), Hadoop, and Hive QL with Power BI and SAS for sophisticated analytics.

Integrated Python with SQL databases to perform complex queries and data manipulations, ensuring seamless data

retrieval and storage for analytical tasks.

Developed automated SAS reports and dashboards, utilizing SAS Report and SAS Graph procedures to visualize

key metrics and support decision-making.

Worked on AWS Auto Scaling for providing high availability of applications and AWS EC2 instances based on the

load of applications by using Cloud Watch in AWS.

Employed Excel for data cleaning tools, such as Text-to-Columns, Remove Duplicates, and Power Query, to pre-

process and transform raw data into structured formats for further analysis.

Developed and maintained complex SQL queries and stored procedures on AWS RDS to support reporting and

analytical needs across various business units.

Automated Spark job executions using EC2 instances with Auto Scaling, ensuring scalable and cost-effective data

processing that dynamically adjusts to workload demands

Implemented advanced data visualization techniques in Power BI, including custom visuals and drill-through

reports, to enhance data storytelling and user engagement.

Configured AWS EMR to integrate with AWS S3 for seamless data ingestion and output, ensuring smooth data

flow across data processing pipelines.

Developed interactive and dynamic reports using R Markdown and Shiny to provide users with customizable and

real-time data insights.

Utilized advanced Excel functions and formulas (VLOOKUP, INDEX/MATCH, and data validation) to clean and

prepare large datasets for analysis, ensuring data accuracy and consistency.

Utilized SAS Extract, Transform, Load (ETL) capabilities to extract data from heterogeneous sources, perform

transformations, and load it into SAS datasets for analysis.

Created PL/SQL Procedures, function Packages and SQL queries starting from middle level to high level

complexity based on system needs.

Worked on analysing Hadoop cluster and different big data analytic tools including Pig, HBase database and

Sqoop.

Created interactive data dashboards with Plotly and Dash in Python, enabling real-time data visualization and

exploration for various business units.

Environment: AWS (Auto Scaling, EC2, Cloud Watch, RDS, EMR, S3), Power BI, Hadoop, Hive QL, T-SQL, PL/SQL,

Python, SQL, SAS Report, SAS Graph, Excel, R Markdown, Shiny, Plotly, Dash, Pig, Sqoop, Hbase.

E EDUCATION: Jawaharlal Nehru Technology University, Hyderabad, TS, India

BTech in Computer Science and Engineering, June 2010 - May 2014

EMAIL: ad8ukj@r.postjobfree.com

PHNO:913-***-****

LinkedIn: www.linkedin.com/in/neha-gatla-

Contact this candidate