Big Data Analysis

Location:

San Diego, CA, 92126

Salary:

90,000$

Posted:

December 03, 2024

Contact this candidate

Resume:

Ishika Tiwari

Æ 619-***-**** [*****************@*****.*** Ishika Tiwari

Professional Summary

•More than four years of experience in the field of professional IT, with expertise in Hadoop/Spark data ingestion, storage, querying, processing, and large-scale data analysis.

• Vast knowledge of the Big Data ecosystem and its different elements Map Reduce, Spark SQL, Spark, HDFS, HIVE, Zookeeper, Sqoop, HBase, PIG, Oozie, and Airflow.

• Thorough understanding of the Name Node, Data Node, Map, and Distributed File System of Hadoop Minimize the amount of programming.

•Working knowledge of custom Map Reduce programs, Pig Latin, and HiveQL for data cleansing and analysis.

•Implemented distributed processing and Spark SQL and Data Frame API to connect to Hive and retrieve the data, making highly scalable.

• Worked on RDD and Dataset API in addition to Spark Core transformation and actions.

•Writing experience with maps Cut down on work with Sqoop, Pig, and Hivefor handling data.

• Knowledge of NoSQL databases and practical experience developing apps for NoSQL databases such as HBase Casandra and DynamoDB.

• Possessing real-time data streaming expertise with tools like Spark Streaming and Kafka.

• Developed and designed Hive data transformation scripts to operate on structured data derived from many data sources a starting point.

•The development team collaborated and integrated seamlessly by using GIT for version control.

• Managed Shell scripts and PL/SQL performance enhancements.

•Working knowledge of batch-style large-scale distributed computing applications and real-time streaming applications utilizing tools such as Streaming Spark.

Education

• Trine University Angola, Indiana

Master of Science: Information Studies (MSIS); GPA: 3.75/4.0 Jan 2024 Courses:Data science, Big data, network management, project management

• Osmania University Hyderabad, India

Bachelors of Commerce in Computers; GPA: 3.8/4.0 May 2022 Courses:,c++, programming with java, Business analysis,Accounting management Skills Summary

•HDFS, MapReduce, Hive, Pig, Gcp, Sqoop, Flume, Oozie, Zookeeper, Kafka, Cassandra, Apache Spark, Spark Streaming, HBase.

• HBase, Cassandra, MongoDB.

• Scala, SQL, Shell Scripting, Python, Scala, HiveQL • Linux, Unix, Windows.

• Oracle, SQL Server, MySQL, Cassandra, Teradata, PostgreSQL, MS Access, Snowflake.

• Informatica Power Centre, Airflow.

• EC2,EMR, S3, EMR, Lambda, Athena.

• Tableau, Microsoft SSIS, SSAS and SSRS.

Experience

• SysIntelli San diego, Ca

Data Engineer Aug 2022-Present

• Expert in HTML5 and CSS3, with practical knowledge of creating visually appealing and responsive websites.

• Use Cloudera and Hortonworks HDP to manage and support the operation of enterprise data warehouses and construct big data and sophisticated predictive applications.

• Skilled in creating big data processing PySpark apps, utilizing the PySpark API, and taking use of its distributed data processing features.

• A great deal of experience with Talend data profiling, migration, extraction, transformation, and loading through ETL technique.

• Using various airflow operators, create data pipelines in GCP airflow for tasks related to ETL.

• Developed use case, class, and sequence diagrams and participated in the design, development, and testing phases of the Software Development Life Cycle (SDLC).

• Converted an on-site program that was already in place to AWS For processing and storing tiny data sets, AWS uses services like EC2 and S3.

. • Skilled in managing the Hadoop cluster on AWS EMR.

• Create and develop ETL processes in AWS Glue to import campaign data into AWS Redshift from external sources such as S3, ORC/Parquet/Text Files.

• Knowledge in setting up, implementing, and maintaining cloud services, such as AWS Confidential Web Services.

• Extensive knowledge of sensitive web services (AWS), including Glue, RDS, VPC, S3, EC2, S3, Step functions, Lambda functions, Redshift, and IAM.

• Using the AWS Console and API Integration, IAM, EC2, S3 bucket, Security groups, RDS, EBS, ELB, Auto Scaling, AMI, Elastic search, and IAM were all managed and setup.

• Developed the Pyspark code for the EMR and AWS Glue jobs.

• Use Python and SnowSQL to implement a one-time data migration of multi-state level data from SQL server to Snowflake.

• Skilled at creating, constructing, and overseeing AWS Step Functions.

• Data Engineer Hyderabad, India

Logic Minds January 2019 - July 2021

• Involved in the design, development, and testing phases of Software Development Life Cycle (SDLC) using agile methodology to gather and analyze the requirements of the Application and Test-Driven Development (TDD).

• Create and maintain reporting infrastructure to facilitate visual representation of manufacturing data for purposes of operations planning and execution.

• Experienced working with data architecture including data ingestion pipeline design, Hadoop information architecture, data modelling and data mining, machine learning and advanced data processing.

• Involved in the development of pipelines for data warehousing, facilitating the creation of major regulatory and financial reports utilizing advanced SQL queries in Snowflake.

• Lead the estimation, review the estimates, identify the complexities, and communicate to all the stakeholders.

• Extract, Transform and Load data from source systems to Azure Data Storage services using a combination of Azure.

• Data Factory, T-SQL, Spark SQL and Azure Data Lake Analytics.

•Worked in developing data lake for the GBT (Global Business Transactions) reporting team Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting,Hive .

• Worked on building Azure Data Warehouse table Data sets for Power BI Reports,supporting data visualization and analytics.

• Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags to run the Airflow.

• Experienced in conducting thorough testing and quality assurance activities at the conclusion of each phase, ensuring that deliverables meet predefined standards and specifications.

Contact this candidate