Post Job Free
Sign in

Data Engineer Engineering

Location:
Hilliard, OH, 43026
Posted:
December 05, 2024

Contact this candidate

Resume:

AISHWARYACHIRANJEEVI

***************@*****.*** 540-***-**** Irving, TX LINKEDIN

SUMMARY Over 6+ years of experience in Data Engineering, Data Pipeline Design, Development, and Implementation as a Sr. Data Engineer/Data Developer and Data Modeler.

Strong experience in writing scripts using Python API, PySpark API, and Spark API for analyzing data.

I have extensively used Python Libraries PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Psycopg, embedPy, NumPy and Beautiful Soup.

Experience in Google Cloud components, Google container builders GCP client libraries, and cloud SDKs

Hands-on use of Spark and Scala APIs to compare the performance of Spark with Hive and SQL, and Spark SQL to manipulate Data Frames in Scala.

I have expertise in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python.

Experience in developing Map Reduce Programs using Apache Hadoop for analyzing big data as per the requirement.

Experience in working with Flume and NiFi for loading log files into Hadoop. Experience in working with NoSQL databases like HBase and Cassandra.

Strong knowledge and hands-on experience with various GCP services and components, ensuring seamless cloud operations and optimizations.

Good Experience in implementing and orchestrating data pipelines using Oozie and Airflow.

Worked with Cloudera and Hortonworks distributions.

Expert in developing SSIS/DTS Packages to extract, transform, and load (ETL) data into data warehouses/ data marts from heterogeneous sources.

Good working knowledge of Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, VPC, ELB, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk (EBS), Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy, Dynamo DB, Auto Scaling, Security Groups, Redshift, CloudWatch, CloudFormation, CloudTrail, Ops Works, Kinesis, IAM, SQS, SNS, SES.

Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data Governance and Metadata Management, Master Data Management, and Configuration Management. Experience in developing customized UDFs in Python to extend Hive and Pig Latin functionality.

Expertise in designing complex Mappings and expertise in performance tuning and slowly changing Dimension Tables and Fact tables

I extensively worked with Teradata utilities, such as Fast Export and Multi Load, to export and load data to/from different source systems, including flat files.

I am experienced in building automation regression Scripts for validation of ETL processes between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python.

Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle).

Experience in designing star schema, Snowflake schema for Data Warehouse, and ODS architecture.

Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing specific features.

TECHNICAL SKILLS:

CATEGORY

SKILL

DESCRIPTION

Programming Languages

Python, Java, Scala

Proficiency in these languages for data manipulation, processing, and

automation.

Scripting

Languages

Bash, Perl, PowerShell

Skills in scripting for automation and

managing data pipelines.

DBMS

SQL, MySQL,

PostgreSQL, Oracle

Expertise in relational database

management systems for data storage and retrieval.

Big Data

Hadoop, Spark, Kafka

Experience with frameworks and tools for handling and processing large datasets.

NoSQL

MongoDB, Cassandra, Redis

Knowledge of non-relational databases

for handling unstructured or semi- structured data.

ETL Tools

Apache NiFi, Talend, Informatica

Familiarity with tools used for Extract, Transform, and Load processes to move

data between systems.

Version

Controlling

Git, SVN

Experience with version control systems to manage code changes and

collaboration.

Agile

Scrum, Kanban

Understanding of Agile methodologies

for project management and iterative development.

Cloud

AWS, Google Cloud, Azure

Proficiency in cloud platforms for scalable data solutions and infrastructure

management.

WORK EXPERIENCE

Sr. Data Engineer T-Mobile, Atlanta, GA

May 2023 - Present

Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. Working extensively on Hive, SQL, Python, Scala, Spark, and Shell. Involved in creating ETL pipelines to ingest data into Azure data lake. Experience in designing data driven solutions.

Involved in building new data sets, and products helping support business initiatives.

Built production quality ingestion pipelines with automated quality checks to help enable the business to access all the data sets in one place.

Automate the pipeline using Airflow by creating dependencies between each job and scheduling them on a daily, weekly, or monthly basis Perform quality checks based on consistency, accuracy, completeness, orderliness, uniqueness, and timeliness and use them to determine ETL job.

I developed Spark Core and Spark SQL scripts using Python for faster data processing. Transformed the data using Spark applications for analytics consumption.

I worked on importing data from SQL Server and Oracle Server into the Azure Data Lake using Sqoop.

I worked on creating incremental Spark jobs to move data from Azure to Snowflake.

Experience in working with delta tables in Azure for loading incremental data into Azure

Experience in using Azure blob storage for storing various data files. Developed and managed File Transfer jobs for data exchange with and from third-party vendors and Optum.

Developed Scala script for ingesting flat files, CSV, and JSON into Data Lake I worked on giving production support for developed applications to resolve

the Critical/Priority-1 issue.

Used Kafka connector for reading CDC feed from Cosmos Db and Mongo DB in real time into Kafka

I worked on multiple streaming jobs to read data from Kafka, transform it, and write it into Azure data lake

Support our Data Scientists by helping enhance their modeling jobs to be more scalable when modeling across the entire data set.

Experience in production and optimizing data science models.

Experience in Big data Eco-System in Azure using Spark, Kubernetes, Airflow, Databricks Coordinating among cross-functional teams to enhance the business and fixing

the issues to improve the ROI

Track production tickets using ticket monitoring tools, perform root cause analysis (RCA) and resolve the tickets.

Contributing to self-organizing teams with minimal supervision working within the Agile/Scrum project methodology.

Environment: Python, SQL, PL/SQL, Java, R, Unix Shell scripting, Oracle, DB2, Teradata, SQL Server, PostgreSQL, Hadoop, HDFS, Hive, Spark, PySpark, Sqoop, Kafka, MongoDB, Amazon DynamoDB, HBase, AWS Glue, Azure Data Factory, GCP, Airflow, Flume, Apache Kafka, Spark Streaming, BitBucket, Git, GitHub, Jira, Rally, AWS EC2, S3, Lambda, EMR, Azure Data Lake, Azure Blob Storage, Snowflake, Kubernetes, Databricks

Data Engineer

Cognizant, Pune, India

Mar 2020 - May 2022

I worked on the Apache Spark data processing project to process data from RDBMS and several data streaming sources, and I developed Spark applications using Python on AWS EMR.

Designed and deployed multi-tier applications leveraging AWS services like EC2, Route 53, S3, RDS, and DynamoDB, focusing on high availability, fault tolerance, and auto-scaling in AWS Cloud Formation.

I configured and launched AWS EC2 instances to execute Spark jobs on AWS Elastic Map Reduce (EMR).

Automated data storage from streaming sources AWS data lakes like S3, Redshift, and RDS by configuring AWS Kinesis (Data Firehose).

Performed analytics using real-time integration capabilities of AWS Kinesis (Data Streams) on streamed data.

I created Sqoop incremental imports, landed the data in Parquet format in HDFS, and transformed it to ORC format using PySpark.

Used dynamic SQL and cursors and their attributes while developing PL/SQL objects. I created reusable utilities and programs in Python to perform repetitive tasks such as sending emails and comparing data.

Created and maintained PLSQL procedures to load data sent in XML files into Oracle tables

Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using HIVE. Created UNIX shell scripts to load data from flat files into Oracle tables.

Created Hive tables to store the processed results in a tabular format.

I developed the Sqoop scripts to ingest the data from Oracle, Teradata, and DB2 into HDFS and Hive.

I developed the Python and Hive scripts for creating reports from Hive data.

Environment: AWS (Lambda, S3, EC2, Redshift, EMR), Redshift, Teradata 15, Python 3.7, PyCharm, Jupyter Notebooks, Big Data, PySpark, Hadoop, Hive, HDFS, Kafka,

Airflow, Snowflake, MongoDB, PostgreSQL, SQL, Tableau, Agile/Scrum, XML, Jira, Slack, Confluence, Docker, GitHub, Git, Oracle 12c, Toad, Unix.

Software Developer

Indegene, Hyderabad, India

Oct 2017 - Feb 2020

Support all phases of the Software development life cycle (SDLC), quality management systems, and project life cycle processes.

Following HTTP and WSDL Standards to Design the REST/SOAP-based Web API using XML, JSON, HTML, and DOM Technologies.

Involved in the installation and configuration of Tomcat, Spring Source Tool Suite, Eclipse, and unit testing.

Back-end server-side coding and development using Java data structure as a collection, including Set, List, Map, Exception Handling, Vaadin, Spring with dependency injection, Struts Framework, Hibernate, Servlets, Action, Action Forms, Java Beans, etc.

Developed Restful APIs to serve several user actions and events, such as generating up-to-date card transaction statements, card usage breakdown reporting, real-time card eligibility, and validation with vendor systems.

Developed ETL processes with change data capture and feed into the data warehouse.

Implemented Web API to use OAuth2.0 with JWT (JSON Web Tokens) to secure the Web API Service Layer.

Implemented application development using many of the design patterns and object-oriented processes in view of future requirements of the Payments domain.

Support all phases of the Software development life cycle (SDLC), quality management systems, and project life cycle processes.

Following HTTP and WSDL Standards to Design the REST/SOAP-based Web API using XML, JSON, HTML, and DOM Technologies.

Involved in the installation and configuration of Tomcat, Spring Source Tool Suite, Eclipse, and unit testing.

Back-end server-side coding and development using Java data structure as a collection, including Set, List, Map, Exception Handling, Vaadin, Spring with dependency injection, Struts Framework, Hibernate, Servlets, Action, Action Forms, Java Beans, etc.

Developed Restful APIs to serve several user actions and events, such as generating up-to-date card transaction statements, card usage breakdown reporting, real-time card eligibility, and validation with vendor systems.

Developed ETL processes with change data capture and feed into the data warehouse.

Implemented Web API to use OAuth2.0 with JWT (JSON Web Tokens) to secure the Web API Service Layer.

Implemented application development using many of the design patterns and object-oriented processes in view of future requirements of the Payments domain.

Front-end development utilizes HTML5, CSS3, and JavaScript, leveraging the Bootstrap framework and Java backend.

Used JAXB to convert Java objects into XML and XML content into a Java object.

Web services were built using Spring and CXF, which operate within MuleESB and offer both REST and SOAP interfaces.

Environment: Java J2EE, JSP, JavaScript, Ajax, Swing, Spring 3.2, Eclipse 4.2, TDD, Hibernate 4.1, XML, Tomcat, Oracle 10g, JUnit, JMS, Log4j, Maven, Agile, Git, JDBC, Web service, XML, SOAP, JAX-WS, Unix, AngularJS and Soap UI.

EDUCATION

Masters in Information Technology

Franklin University

Major in Cyber Security

Bachelor of Electronic and Communication Engineering

St Peters Engineering College

Sept 2022 - May 2024

Jul 2013 - Aug 2017



Contact this candidate