AISHWARYACHIRANJEEVI
***************@*****.*** 540-***-**** Irving, TX LINKEDIN
SUMMARY Over 6+ years of experience in Data Engineering, Data Pipeline Design, Development, and Implementation as a Sr. Data Engineer/Data Developer and Data Modeler.
Strong experience in writing scripts using Python API, PySpark API, and Spark API for analyzing data.
I have extensively used Python Libraries PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Psycopg, embedPy, NumPy and Beautiful Soup.
Experience in Google Cloud components, Google container builders GCP client libraries, and cloud SDKs
Hands-on use of Spark and Scala APIs to compare the performance of Spark with Hive and SQL, and Spark SQL to manipulate Data Frames in Scala.
I have expertise in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python.
Experience in developing Map Reduce Programs using Apache Hadoop for analyzing big data as per the requirement.
Experience in working with Flume and NiFi for loading log files into Hadoop. Experience in working with NoSQL databases like HBase and Cassandra.
Strong knowledge and hands-on experience with various GCP services and components, ensuring seamless cloud operations and optimizations.
Good Experience in implementing and orchestrating data pipelines using Oozie and Airflow.
Worked with Cloudera and Hortonworks distributions.
Expert in developing SSIS/DTS Packages to extract, transform, and load (ETL) data into data warehouses/ data marts from heterogeneous sources.
Good working knowledge of Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, VPC, ELB, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk (EBS), Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy, Dynamo DB, Auto Scaling, Security Groups, Redshift, CloudWatch, CloudFormation, CloudTrail, Ops Works, Kinesis, IAM, SQS, SNS, SES.
Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data Governance and Metadata Management, Master Data Management, and Configuration Management. Experience in developing customized UDFs in Python to extend Hive and Pig Latin functionality.
Expertise in designing complex Mappings and expertise in performance tuning and slowly changing Dimension Tables and Fact tables
I extensively worked with Teradata utilities, such as Fast Export and Multi Load, to export and load data to/from different source systems, including flat files.
I am experienced in building automation regression Scripts for validation of ETL processes between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python.
Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle).
Experience in designing star schema, Snowflake schema for Data Warehouse, and ODS architecture.
Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing specific features.
TECHNICAL SKILLS:
CATEGORY
SKILL
DESCRIPTION
Programming Languages
Python, Java, Scala
Proficiency in these languages for data manipulation, processing, and
automation.
Scripting
Languages
Bash, Perl, PowerShell
Skills in scripting for automation and
managing data pipelines.
DBMS
SQL, MySQL,
PostgreSQL, Oracle
Expertise in relational database
management systems for data storage and retrieval.
Big Data
Hadoop, Spark, Kafka
Experience with frameworks and tools for handling and processing large datasets.
NoSQL
MongoDB, Cassandra, Redis
Knowledge of non-relational databases
for handling unstructured or semi- structured data.
ETL Tools
Apache NiFi, Talend, Informatica
Familiarity with tools used for Extract, Transform, and Load processes to move
data between systems.
Version
Controlling
Git, SVN
Experience with version control systems to manage code changes and
collaboration.
Agile
Scrum, Kanban
Understanding of Agile methodologies
for project management and iterative development.
Cloud
AWS, Google Cloud, Azure
Proficiency in cloud platforms for scalable data solutions and infrastructure
management.
WORK EXPERIENCE
Sr. Data Engineer T-Mobile, Atlanta, GA
May 2023 - Present
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. Working extensively on Hive, SQL, Python, Scala, Spark, and Shell. Involved in creating ETL pipelines to ingest data into Azure data lake. Experience in designing data driven solutions.
Involved in building new data sets, and products helping support business initiatives.
Built production quality ingestion pipelines with automated quality checks to help enable the business to access all the data sets in one place.
Automate the pipeline using Airflow by creating dependencies between each job and scheduling them on a daily, weekly, or monthly basis Perform quality checks based on consistency, accuracy, completeness, orderliness, uniqueness, and timeliness and use them to determine ETL job.
I developed Spark Core and Spark SQL scripts using Python for faster data processing. Transformed the data using Spark applications for analytics consumption.
I worked on importing data from SQL Server and Oracle Server into the Azure Data Lake using Sqoop.
I worked on creating incremental Spark jobs to move data from Azure to Snowflake.
Experience in working with delta tables in Azure for loading incremental data into Azure
Experience in using Azure blob storage for storing various data files. Developed and managed File Transfer jobs for data exchange with and from third-party vendors and Optum.
Developed Scala script for ingesting flat files, CSV, and JSON into Data Lake I worked on giving production support for developed applications to resolve
the Critical/Priority-1 issue.
Used Kafka connector for reading CDC feed from Cosmos Db and Mongo DB in real time into Kafka
I worked on multiple streaming jobs to read data from Kafka, transform it, and write it into Azure data lake
Support our Data Scientists by helping enhance their modeling jobs to be more scalable when modeling across the entire data set.
Experience in production and optimizing data science models.
Experience in Big data Eco-System in Azure using Spark, Kubernetes, Airflow, Databricks Coordinating among cross-functional teams to enhance the business and fixing
the issues to improve the ROI
Track production tickets using ticket monitoring tools, perform root cause analysis (RCA) and resolve the tickets.
Contributing to self-organizing teams with minimal supervision working within the Agile/Scrum project methodology.
Environment: Python, SQL, PL/SQL, Java, R, Unix Shell scripting, Oracle, DB2, Teradata, SQL Server, PostgreSQL, Hadoop, HDFS, Hive, Spark, PySpark, Sqoop, Kafka, MongoDB, Amazon DynamoDB, HBase, AWS Glue, Azure Data Factory, GCP, Airflow, Flume, Apache Kafka, Spark Streaming, BitBucket, Git, GitHub, Jira, Rally, AWS EC2, S3, Lambda, EMR, Azure Data Lake, Azure Blob Storage, Snowflake, Kubernetes, Databricks
Data Engineer
Cognizant, Pune, India
Mar 2020 - May 2022
I worked on the Apache Spark data processing project to process data from RDBMS and several data streaming sources, and I developed Spark applications using Python on AWS EMR.
Designed and deployed multi-tier applications leveraging AWS services like EC2, Route 53, S3, RDS, and DynamoDB, focusing on high availability, fault tolerance, and auto-scaling in AWS Cloud Formation.
I configured and launched AWS EC2 instances to execute Spark jobs on AWS Elastic Map Reduce (EMR).
Automated data storage from streaming sources AWS data lakes like S3, Redshift, and RDS by configuring AWS Kinesis (Data Firehose).
Performed analytics using real-time integration capabilities of AWS Kinesis (Data Streams) on streamed data.
I created Sqoop incremental imports, landed the data in Parquet format in HDFS, and transformed it to ORC format using PySpark.
Used dynamic SQL and cursors and their attributes while developing PL/SQL objects. I created reusable utilities and programs in Python to perform repetitive tasks such as sending emails and comparing data.
Created and maintained PLSQL procedures to load data sent in XML files into Oracle tables
Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using HIVE. Created UNIX shell scripts to load data from flat files into Oracle tables.
Created Hive tables to store the processed results in a tabular format.
I developed the Sqoop scripts to ingest the data from Oracle, Teradata, and DB2 into HDFS and Hive.
I developed the Python and Hive scripts for creating reports from Hive data.
Environment: AWS (Lambda, S3, EC2, Redshift, EMR), Redshift, Teradata 15, Python 3.7, PyCharm, Jupyter Notebooks, Big Data, PySpark, Hadoop, Hive, HDFS, Kafka,
Airflow, Snowflake, MongoDB, PostgreSQL, SQL, Tableau, Agile/Scrum, XML, Jira, Slack, Confluence, Docker, GitHub, Git, Oracle 12c, Toad, Unix.
Software Developer
Indegene, Hyderabad, India
Oct 2017 - Feb 2020
Support all phases of the Software development life cycle (SDLC), quality management systems, and project life cycle processes.
Following HTTP and WSDL Standards to Design the REST/SOAP-based Web API using XML, JSON, HTML, and DOM Technologies.
Involved in the installation and configuration of Tomcat, Spring Source Tool Suite, Eclipse, and unit testing.
Back-end server-side coding and development using Java data structure as a collection, including Set, List, Map, Exception Handling, Vaadin, Spring with dependency injection, Struts Framework, Hibernate, Servlets, Action, Action Forms, Java Beans, etc.
Developed Restful APIs to serve several user actions and events, such as generating up-to-date card transaction statements, card usage breakdown reporting, real-time card eligibility, and validation with vendor systems.
Developed ETL processes with change data capture and feed into the data warehouse.
Implemented Web API to use OAuth2.0 with JWT (JSON Web Tokens) to secure the Web API Service Layer.
Implemented application development using many of the design patterns and object-oriented processes in view of future requirements of the Payments domain.
Support all phases of the Software development life cycle (SDLC), quality management systems, and project life cycle processes.
Following HTTP and WSDL Standards to Design the REST/SOAP-based Web API using XML, JSON, HTML, and DOM Technologies.
Involved in the installation and configuration of Tomcat, Spring Source Tool Suite, Eclipse, and unit testing.
Back-end server-side coding and development using Java data structure as a collection, including Set, List, Map, Exception Handling, Vaadin, Spring with dependency injection, Struts Framework, Hibernate, Servlets, Action, Action Forms, Java Beans, etc.
Developed Restful APIs to serve several user actions and events, such as generating up-to-date card transaction statements, card usage breakdown reporting, real-time card eligibility, and validation with vendor systems.
Developed ETL processes with change data capture and feed into the data warehouse.
Implemented Web API to use OAuth2.0 with JWT (JSON Web Tokens) to secure the Web API Service Layer.
Implemented application development using many of the design patterns and object-oriented processes in view of future requirements of the Payments domain.
Front-end development utilizes HTML5, CSS3, and JavaScript, leveraging the Bootstrap framework and Java backend.
Used JAXB to convert Java objects into XML and XML content into a Java object.
Web services were built using Spring and CXF, which operate within MuleESB and offer both REST and SOAP interfaces.
Environment: Java J2EE, JSP, JavaScript, Ajax, Swing, Spring 3.2, Eclipse 4.2, TDD, Hibernate 4.1, XML, Tomcat, Oracle 10g, JUnit, JMS, Log4j, Maven, Agile, Git, JDBC, Web service, XML, SOAP, JAX-WS, Unix, AngularJS and Soap UI.
EDUCATION
Masters in Information Technology
Franklin University
Major in Cyber Security
Bachelor of Electronic and Communication Engineering
St Peters Engineering College
Sept 2022 - May 2024
Jul 2013 - Aug 2017