Post Job Free
Sign in

Big Data Sql Server

Location:
Plano, TX, 75023
Posted:
July 26, 2024

Contact this candidate

Resume:

Shahzeb Syed

ad7jcf@r.postjobfree.com 813-***-**** linkedin.com/in/syed-s-348594145 Dallas, TX

Summary

Big Data Developer with 7+ years of proven industry experience. Proficient in designing, developing, and maintaining scalable data solutions utilizing cloud-based technologies. Skilled in deploying and managing ETL ingestion pipelines on Google Cloud Platform (GCP), AWS and On-prem CDH/HDP Clusters. Proficient in data extraction, processing, and storage techniques across various platforms, including the Big Data stack, Teradata, and SQL Server. Adept at optimizing performance, ensuring data security, and collaborating with cross-functional teams to deliver actionable insights. Strong problem-solving skills with a continuous learning mindset to stay abreast of the latest technologies and industry trends.

Professional Experience

Verizon Jul 2021 - Present

Big Data/Teradata Developer Irving, TX

Commercially launch and support Verizon’s +Play platform on the market, where customers can discover, purchase and manage their digital subscriptions across entertainment platforms including audio, gaming, fitness, lifestyle and more - all in one place.

● Worked on productionalizing the entire +Play data lifecycle powering insight from Sales, Subscriptions, Cancellations, Payments, Credits, Promotions and several performance metrics in near real time.

● Designed, developed & deployed Java on Spark and Scala on Spark applications running on Google Compute Engines.

● Responsible for Understanding Data mapping Document and building Teradata load scripts.

● Designed, developed & deployed Java on Spark and Scala on Spark applications running on Google Compute Engines.

● Implemented CICD pipelines using Jenkins to deploy applications in SIT and Production environments.

● Creating efficient Extract, Transform, Load (ETL) processes to move and transform data from various sources into the Teradata or Big Data environment.

● Migrated data from source systems such as SQL Server, flat files, etc., and loaded it into Teradata databases using tools like MLOAD, FastLoad, OLELOAD, TPT and vice-versa.

● Implemented dashboards to provide key insights and analytics to track financial performance of +Play.

● Created, monitored and managed BTEQ batch jobs running on SQL Server Management Studio

● Established heartbeat monitoring for replication and ingestion tasks, automating daily validation processes across various systems and performing sanity checks on data to ensure deliverables are not affected.

● Collaborated with teams working on related technologies like Hadoop, Tableau, SQL Server, etc., to integrate Teradata with other systems and tools as needed.

● Deploy and maintain ETL ingestion pipelines on Google Cloud Platform (GCP), utilizing BigQuery.

● Managed codebases on Gitlab and published CICD pipelines on Jenkins to deploy code to production and SIT environments. Tech Stack: Scala, Spark, Java, Hive, Linux, Oozie, Shell Scripting, GitLab, SQL Server, KSH, Teradata SQL Assistant, Basic Teradata Query(BTEQ), Tableau, SSMS, GCP, BigQuery

Nielsen Jul 2017 - Jul 2021

Big Data Developer Tampa, FL

Digital Content Ratings provides comprehensive content consumption measurement across every major digital platform. The solution includes best-in-class demographics from industry-leading data providers that enable publishers, agencies and advertisers alike to get a robust understanding of who consumes content across the digital media landscape.

● Design and Develop Spark code using Scala/Java and Spark-SQL/Streaming for faster processing of data.

● Implemented Kinesis streams to consume data for Spark Streaming applications on AWS EMR’s.

● Responsible for building scalable distributed data pipelines using the Big Data tech stack.

● Developed Scripts and Batch Jobs to schedule various Oozie workflows/coordinators and Cron Jobs.

● Designed and Built processing pipelines on Hadoop which ingest both Structured and Unstructured datasets.

● Wrote Hive queries for data ingestion and storing output of Business applications.

● Used Amazon CLI for data transfers to and from Amazon S3 buckets, updating IAM policies and managing roles/policies.

● Executed Hadoop/Spark jobs on Cloudera(CDP) and AWS EMR to process huge datasets and perform analytics on data stored in S3 Buckets.

● Created Hive tables and involved in data loading and writing Hive UDFs.

● Used Amazon Cloudwatch to monitor and track resources on AWS.

● Involved in converting Hive/SQL queries into Spark transformations using Spark Datasets, RDDs in Scala.

● Optimized and updated deprecated code and migrated from Spark 1.6 to Spark 2.2.1 to utilizing performance enhancement techniques using Datasets and Dataframes.

● Monitoring applications running on production clusters and performing sanity checks on data to ensure deliverables are not affected.

● Involved in creating Bash shell scripts for database connectivity and executing queries in parallel job execution.

● Analyzed large and critical datasets using Spark, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper, AWS.

● Developed and written Apache Pig scripts and Hive scripts to process the HDFS data.

● Worked on debugging, performance tuning of Hive & Pig Jobs.

● Used Cloudformation scripts to create, build, maintain and destroy clusters on AWS.

● Built CICD pipelines on Jenkins for Automated deployments of applications in Prod and Non-Prod environment

● Published custom metrics on Datadog for cluster level stats, control file monitoring and data validations. Tech Stack: CDH5, Scala, Java, Spark, Spark Streaming, Spark SQL, Spark Datasets, Kinesis, HDFS, AWS, Hive, Pig, Linux, Eclipse, Cloudformation, Oozie, Hue, MapReduce, Apache Kafka, Oracle, Shell Scripting, MongoDB, EMR, EC2, S3, EMRFS, IAM, CloudFormation, VPC, Datadog

VDriveInfo, Inc Jan 2017 - Jul 2017

Big Data Developer Plano, TX

● Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.

● Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Hive.

● Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.

● Creating Hive tables, dynamic partitioning, buckets for sampling, and working on them using HiveQL and stored data in tabular formats using Hive tables and Hive SerDes.

● Involved in creating UNIX shell scripts for database connectivity and executing queries in parallel job execution.

● Developed and written Apache Pig scripts to parse the data and store it in Avro format and Hive scripts to process the HDFS data.

● Exported the analyzed data to the relational databases using Sqoop for visualization.

● Analyzed large and critical datasets using HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper.

● Developed custom aggregate UDF’s in Hive to parse log files.

● Identified the required data to be pooled to HDFS and created Sqoop scripts which were scheduled periodically to migrate data to the Hadoop environment.

● Involved with File Processing using Pig Latin.

● Created MapReduce jobs involving custom combiners and custom partitioners to deliver better results and worked on application performance optimization for an HDFS cluster.

● Worked on debugging, performance tuning of Hive & Pig Jobs. Tech Stack: Cloudera, MapReduce, HDFS, Pig Scripts, Hive Scripts, HBase, Sqoop, Zookeeper, Oozie, Oracle, Shell Scripting Vertilink Technologies May 2015 - Dec 2015

Jr Java Developer India

● Involved in coding, designing, documenting, debugging and maintenance of several applications.

● Involved in creation of SQL tables, indexes and was involved in writing queries to read/manipulate data.

● Used JDBC to establish connection between the database and the application.

● Created the user interface using HTML, CSS and JavaScript.

● Maintenance and support of the existing applications.

● Responsible for the development of database SQL queries

● Created/modified shell scripts for scheduling and automating tasks.

● Wrote unit test cases using Junit framework.

Tech Stack: Eclipse, Java, HTML, CSS and JavaScript, SQL, Junit, JDBC, Shell Scripting Education

University of Illinois Springfield, Illinois

Masters in Computer Science

Technical Skills

Programming Languages: Java, Scala, Python, SQL, Groovy, BTEQ, Shell Scripting, KSH Big Data Technologies: Hadoop (Horton works, Cloudera), HDFS, YARN, Map Reduce, Apache Spark 1.X/2.X, Apache Pig, Apache Hive, Apache HBase, Impala, Sqoop, Cassandra, MongoDB, Spark Streaming, Spark SQL, Oozie, Hue, Zookeeper, and Apache Kafka Cloud Technologies: EC2, EMR, CloudFormation, S3, IAM, Athena, Lambda’s, Glacier, RDS, Kinesis, VPC, Subnet control, VPC Peering, Cloudwatch, Simple Notification Service (SNS), RDS, Glue, BigQuery Databases: MySQL, Oracle 11g, DB2, MS-SQL Server, HBase, Cassandra, MongoDB, Teradata Developer Tools: Teradata SQL Assistant, IntelliJ, Eclipse, NetBeans, Visual Studio, SQL Server Management Studio, Maven, Junit, MRUnit



Contact this candidate