Post Job Free
Sign in

Machine Learning Software Development

Location:
Manchester, NH
Posted:
March 20, 2025

Contact this candidate

Resume:

Name: Srikar

Email: ************@*****.***

Contact: 603-***-****

PROFESSIONAL SUMMARY:

●8+ years of experience in software development which includes Design and Development of Enterprise and Web-based applications.

●Hands-on technical experience in Python, MySQL, AWS, GCP, Machine Learning modeling, DB2 SQL, R programming in various domains from finance, banking, e-commerce, to healthcare.

●Experience with Amazon Web Services (Amazon EC2, AWS S3, AWS RDS, AWS Glue, AWS Kinesis, Amazon Elastic Load Balancing, Amazon SQS, AWS IAMs, Amazon SNS, AWS Cloud Watch, Amazon EBS, Amazon CloudFront, VPC, DynamoDB, Lambda and Redshift).

●Experience with Google Cloud Platform for big query, cloud dataproc and Apache airflow services.

●Experience in using python integrated IDEs like PyCharm, Sublime Text, and IDLE.

●Experience in developing web applications and implementing Model View Control (MVC) architecture using server-side applications Django and Flask.

●Proficient in programming languages Python, SQL, and Scala.

●Strong experience working with large datasets and designing highly scalable and optimized data modeling and data integration pipelines.

●Working knowledge on Kubernetes to deploy scale, load balance, and manage Docker containers

●Extensive experience in Data Extraction, Transforming and Loading (ETL) using various tools such as SQL Server Integration Services (SSIS), Data Transformation Services (DTS).

●Experience in Database Design and development with Business Intelligence using SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), OLAP Cubes, Star Schema and Snowflake Schema.

●Expertise in designing data intensive applications using Hadoop Ecosystem, Big Data Analytical, Cloud Data engineering, Data Warehouse, Data Visualization, Reporting, and Data Quality solutions

●Adept in building Data Warehouse using Star and Snowflake schemas.

●Expertise in Exploratory Data Analysis, Big Data Analytics using Spark, Predictive analysis using Linear and Logistic Regression models and good knowledge in supervised/unsupervised algorithms.

●Worked on different statistical techniques like Linear/Logistic Regression, Random Forest, A/B Testing, ANOVA, Chi-Square Analysis, K-means Clustering.

●Hands-on experience on Visualizing the data using Power BI, Tableau, R(ggplot), Python (Pandas, matplotlib, NumPy, SciPy).

●Proficient in all phases of Software Development Life Cycle (SDLC) including Requirements gathering, Analysis, Design, Reviews, Coding, Unit Testing, and Integration Testing.

●Analyzed the requirements and developed Use Cases, UML Diagrams, Class Diagrams, Sequence and State Machine Diagrams.

●Proven leadership and people management skills with ability in resolving complex business problems.

●Direct interaction with client, offshore and onshore teams and business users across different locations from critical issues to production launches.

TECHNICAL SKILLS:

Big Data Technologies

Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, HBase, Impala, Kafka, Spark, Airflow

Cloud Technologies

AWS (S3, EC2, EMR, RedShift, Lambda, Glue, Kinesis, and more) and

GCP (Big Query, Data Proc, Dataflow, Airflow), GCP (Google Cloud Platform), Snowflake

Programming Languages

Python, Scala, Java, R.

Databases

MySQL, Oracle

Development & ETL Tools

Eclipse, IntelliJ, Maven, Jenkins, Tableau, Apache Airflow, Informatica

Other Tools

Putty, WinSCP, Amazon AWS Console, Apache Ambari, PyCharm, Visual Studio, R Studio, Power BI, SAS Studio, Eclipse, Mainframes, Notebook, Databricks, Terraforms

Version Control

GitHub, SVN, CVS

Methodologies

Agile, Waterfall

Operating Systems

Windows, Unix, Linux

Project Management Tools

JIRA, Rally, MS Project Professional, SharePoint, Service Now, Contact Center- Genesys.

Reporting Tools

Tableau, Power BI and Advanced Excel with VBA

EDUCATION:

Master’s degree in information technology – Southern New Hampshire University – Dec 2024

Bachelor’s degree in ECE – JNTUH May 2015

WORK EXPERIENCE:

Bank of America, Dallas, TX (Remote) June 2024 – Present

Sr.Data Engineer

Responsibilities:

Designed and implemented a scalable data lake architecture on AWS S3 and Databricks Delta Lake for storing and processing structured and unstructured data.

Built ETL pipelines using AWS Glue and Databricks to ingest, clean, and transform data from multiple sources (e.g., databases, APIs, logs).

Developed real-time data processing pipelines using AWS Kinesis and Databricks Structured Streaming for fraud detection and transaction monitoring.

Implemented data partitioning and optimization strategies in AWS S3 to improve query performance and reduce storage costs.

Enforced data governance and security policies using AWS IAM and Databricks Unity Catalog for role-based access control and compliance.

Created metadata management solutions using AWS Glue Data Catalog and Databricks Delta Lake to track data lineage and schema evolution.

Set up data quality monitoring frameworks using Databricks and Great Expectations to ensure accuracy and consistency of data.

Integrated the data lake with BI tools like Tableau and Power BI for analytics and reporting.

Optimized storage costs by implementing AWS S3 lifecycle policies and archiving infrequently accessed data to S3 Glacier.

Designed disaster recovery strategies using AWS S3 versioning and cross-region replication to ensure data availability.

Designed and implemented real-time data pipelines using Azure Data Factory and Azure Synapse Analytics, integrating data from multiple sources such as GCP Big Query, AWS S3, and on-premises databases.

Developed and deployed scalable Azure Databricks clusters for processing large-scale financial data, leveraging Spark for real-time analytics and machine learning model training.

Utilized Azure Event Hubs and Azure Stream Analytics to process real-time transaction data streams.

Automated regulatory reporting workflows using Apache Airflow and Databricks to meet compliance requirements.

Collaborated with data scientists to prepare training datasets for machine learning models using Databricks and AWS Sage Maker.

Automated regulatory reporting workflows using Apache Airflow and Databricks to meet compliance requirements.

Migrated on-premises data systems to the cloud, leveraging AWS and Databricks for scalable and cost-effective data processing.

Documented the data lake architecture, pipelines, and processes to facilitate knowledge sharing and onboarding.

Environment: AWS S3, Databricks Delta Lake, AWS Glue, AWS Glue Data Catalog, Kafka, Kinesis, Spark, Athena, AWS IAM, Apache Airflow, MLFlow, Tableau, PowerBI, Apache NiFi, AWS SageMaker

Lloyds Technology Centre, Hyderabad, India July 2019 – Oct 2022

Data Engineer

Responsibilities:

●Involved in converting Hive/SQL queries into Spark transformations using Scala.

●Created Spark data frames using Spark SQL and prepared data for data analytics by storing it in AWS S3.

●Responsible for loading data from Kafka into HBase using REST API.

●Developed batch scripts to fetch the data from AWS S3 storage and perform required transformations in Scala using Spark framework and AWS Glue ETL.

●Utilized AWS Kinesis services for analytics using Apache Flink and Firehose to load data into target destinations with high availability and durability.

●Integrated Databricks with AWS S3 and HDFS for seamless data ingestion, transformation, and storage in a data lake environment.

●Used Spark streaming APIs to perform transformations and actions on the fly for building a common learner data model which gets the data from Kafka in near real time and persists it to the HBase.

●Created Sqoop scripts to import and export customer profile data from RDBMS to S3 buckets.

●Developed various enrichment applications in Spark using Scala for cleansing and enrichment of clickstream data with customer profile lookups.

●Troubleshooted Spark applications for improved error tolerance and reliability.

●Used Spark Data frame and Spark API to implement batch processing of Jobs.

●Automated creation and termination of AWS EMR clusters.

●Worked on fine tuning and performance enhancements of various spark applications and hive scripts.

●Built high-throughput data pipelines using Kafka to ingest customer behavioral data into HDFS and AWS S3, enabling advanced analytics and segmentation.

●Built real-time data pipelines using Databricks and Spark Streaming to process data from Kafka, enabling near real-time analytics and insights.

●Used various concepts in spark like broadcast variables, caching, dynamic allocation to design more scalable spark applications.

●Maintained Kubernetes patches and upgrades.

●Managed multiple Kubernetes clusters in a production environment.

●Identify source systems, their connectivity, related tables, and fields and ensure data suitability for mapping, preparing unit test cases, and provide support to the testing team to fix defects.

Environment: AWS EMR, AWS S3, AWS Kinesis, AWS Glue ETL, Spark, Hive, Sqoop, Scala, MySQL, Hadoop, Oracle DB, AWS Athena, AWS Redshift.

ITC Infotech, Hyderabad, India Apr 2016 – June 2019

Data Engineer

Responsibilities:

●Involved in Analysis, Design, and Implementation/translation of Business User requirements.

●Designed and developed data ingestion pipelines using Sqoop and Spark, handling large-scale structured and unstructured data.

●Worked on collection of large sets of Structured and Unstructured data using Python Script.

●Involved in designing and developing data ingestion, aggregation, and integration into Hadoop .

●Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.

●Designed and optimized multi-terabyte data warehouses using GCP BigQuery, reducing query execution time by 40% and improving system performance.

●Integrated Kafka with Spark Streaming and Apache Flink for real-time data transformations and analytics, ensuring low-latency processing.

●Utilized Databricks to optimize Spark jobs, leveraging features like caching, broadcast variables, and dynamic allocation for scalable and efficient data processing.

●Identified inconsistencies in data collected from different sources.

●Utilized GCP Pub/Sub and Cloud Composer for real-time data ingestion and workflow orchestration, ensuring high availability and scalability.

●Designed object model, data model, tables, constraints, necessary stored procedures, functions, triggers, and packages for Oracle Database.

●Wrote Spark applications for Data validation, cleansing, transformations, and custom aggregations.

●Worked on installing cluster, commissioning & decommissioning of Data node, Name node high availability, capacity planning, and slots configuration.

●Developed Spark applications for the entire batch processing by using Scala.

●Stored time-series transformed data from Spark engine built on top of a Hive platform to S3.

●Visualized the results using Tableau dashboards and the Python Seaborn libraries in deployment.

●Provided insights and recommendations and implemented changes for positive business growth.

Environment: R, SQL server, Oracle, HDFS, HBase, AWS, MapReduce, Hive, Impala, Pig, Sqoop, NoSQL, Tableau, RNN, LSTM, Unix/Linux.

Panamax Infotech Ind Pvt Ltd, Hyderabad, India May 2015 – Mar 2016

Python Developer

Responsibilities:

●Created web-based applications using Python on Django framework for data processing.

●Implemented preprocessing procedures along with deployment by creating virtual machines using EC2.

●Analyzed user behavior data and collaborated with program managers, business analysts, developers and other key stakeholders to develop effective product solutions.

●Extracted data from S3 through SQL database, performed ETL using AWS Glue and utilized kinesis Apache Flink for data analytics.

●Applied SQL Queries, procedures for data manipulation, extraction, and analysis for product optimization.

●Created interactive dashboards using Power BI for data visualization, by translating complex model outputs into engaging, visual reports and enhanced business decision-making, and actions.

●Defined data needs, evaluated data quality, and extracted/transformed data for analytic projects and research.

●Designed and maintained databases using Python and developed Python based API (RESTful Web Service) using Flask, and PostgreSQL.

●Worked on server-side applications using Python programming.

●Employed Visual Studio Code, and Jupyter Notebook to streamline code writing and debugging processes, reducing development time and improving overall code quality.

●Conducted through software maintenance, testing, and troubleshooting, to ensure smooth operations.

●Researched and identified industry trends, providing valuable production improvement recommendations.

●Performed efficient delivery of code and continuous integration to keep in line with Agile principles.

●Experienced in Agile framework, from sprint planning and meetings to retrospectives, product backlog management and writing user stories.

●Researched and identified industry trends, providing valuable production improvement recommendations and maintained program libraries, user's manuals and technical documentation.

Environment: Python, ETL, Django, RESTful web service, MySQL, PostgreSQL, Visio, SQL Server Management Studio, AWS S3, AWS Glue, AWS Kinesis, AWS EC2, and Power BI.



Contact this candidate