Data Engineer Engineering

Location:

Edison, NJ

Posted:

April 28, 2025

Contact this candidate

Resume:

Mohan Rao Marineni

Sr. Azure Data Engineer

Phone: 903-***-****

Email: ****************@*****.***

PROFESSIONAL SUMMARY

Experienced Data Architect and Senior Data Engineer with over 11 years of success designing and implementing scalable, cloud-native data solutions across enterprise environments.

Skilled in enterprise data transformation, including building robust data engines to support ERP migrations across domains like Budgeting, Financials, and Procurement.

Hands-on expertise in AWS Cloud technologies, with deep experience using services like AWS Glue for ETL, S3, RDS, and related infrastructure for secure, scalable data pipelines.

Proficient in PySpark, Python, and SQL for building efficient, high-performance ETL frameworks and enabling advanced analytics on large-scale datasets.

Strong background working with RESTful APIs, integrating third-party and internal systems to drive data ingestion, enrichment, and operational reporting.

Demonstrated excellence in Master Data Management (MDM) initiatives—building unified, deduplicated, and governed master records to ensure data integrity across complex ecosystems.

Experienced using Databricks for collaborative data engineering, building advanced transformations, and supporting machine learning workflows.

Familiar with MS Fabric and Snowflake (TBD skills), with the ability to quickly adapt and deliver on modern cloud data warehouse and analytics platforms.

Proficient with ETL tools and frameworks like Informatica and AWS Glue, capable of designing end-to-end data integration solutions that align with business and technical needs.

Adept at developing scalable backend services and automation scripts to streamline API consumption, data validation, and orchestration workflows.

Deep knowledge of data modeling, schema design, and data normalization, ensuring performance, maintainability, and data quality across systems.

Strong collaborator, partnering closely with business stakeholders, analysts, and engineers to turn strategic goals into technical execution plans.

Experienced with DevOps principles and CI/CD pipelines (Azure DevOps, GitHub Actions, Jenkins), ensuring rapid and reliable delivery of cloud data projects.

Known for bringing a consultative mindset to technical discussions, balancing considerations of cost, quality, scalability, and compliance.

Committed to agile development practices, continuous improvement, and mentoring peers in clean coding, data best practices, and cloud-native design principles.

Passionate about solving complex data challenges, driving operational excellence, and building future-proof data ecosystems that empower enterprise decision-making.

Education:

Bachelors from Jawaharlal Nehru Technological University Hyderabad, India.

Technical Skills:

Azure Services

Azure data Factory, Azure Data Bricks, Databricks Notebooks, Databricks SQL, DBT (Data Build Tool), Databricks Repos, Databricks Delta Lake, Databricks MLflow, Databricks Workflows, Databricks Structured Streaming, Azure Kubernetes Service, Azure Machine Learning, Azure Synapse Analytics, Azure Active Directory, Informatica PowerCenter, Logic Apps, Functional App, Snowflake, Azure Synapse Analytics

Big Data Technologies

MapReduce, Hive, Teg, Python, PySpark, Scala, Apache Flink, Apache HBase, Apache Kafka, Elasticsearch, Spark streaming, Oozie, Sqoop, Zookeeper

Hadoop Distribution

Cloudera, Horton Works, Azure HDInsight and MapR on Azure.

Languages

Java, SQL, PL/SQL, Python, HiveQL, Scala.

Web Technologies

HTML, CSS, JavaScript, XML, JSP, Restful, SOAP

Operating Systems

Windows (10), macOS, Debian, UNIX, LINUX, UBUNTU, CENTOS.

Build Automation tools

Ant, Maven, Terraform, Jenkins and Apache Airflow.

Version Control

GIT, GitHub, Bitbucket and Subversion

Methodology

Agile, Scrum and Waterfall.

IDE &Build Tools, Design

Eclipse, Visual Studio, PyCharm and Microsoft Visual Studio.

Databases

MS SQL Server 2019/2016/2014/2012, Azure SQL DB, Azure Synapse. MS Excel, MS Access, Oracle 19c/21c,Oracle 11g/12c, Cosmos DB and MongoDB Atlas.

Professional Experience:

Architected and built scalable, cloud-native data engineering solutions on AWS, supporting large-scale ERP migration initiatives across financial, procurement, and budgeting systems.

Developed and maintained ETL pipelines using AWS Glue and PySpark, ensuring accurate, high-quality data ingestion, transformation, and delivery across enterprise platforms.

Integrated multiple internal and external RESTful APIs, enabling seamless data aggregation and driving enriched reporting and analytics capabilities.

Led initiatives in Master Data Management (MDM), designing unified master records and resolving data duplication across fragmented source systems.

Designed and optimized data workflows using Databricks for distributed data processing, ensuring high performance for analytics and reporting use cases.

Implemented data governance frameworks, establishing field-level data mappings, normalization standards, and validation rules to maintain data consistency and compliance.

Built and deployed scalable backend services and API-driven data ingestion frameworks using Python, SQL, and AWS-native services like Lambda and API Gateway.

Supported operational analytics by delivering curated datasets to Snowflake and preparing structured data models to enable downstream business intelligence initiatives.

Collaborated closely with engineers, analysts, and business teams to translate reporting needs into technical specifications and data engineering workflows.

Designed secure and automated CI/CD pipelines leveraging Terraform, AWS CodePipeline, and GitHub Actions, ensuring efficient code deployment and infrastructure management.

Developed monitoring and observability solutions using AWS CloudWatch, enhancing visibility into pipeline health, API performance, and data flow status.

Conducted technical evaluations of new data sources and third-party APIs, assessing integration complexity, licensing models, update frequencies, and long-term viability.

Championed performance tuning and cost optimization across AWS services and Snowflake environments, reducing operational costs while maintaining service reliability.

Authored detailed internal documentation, including data dictionaries, process workflows, and API integration guidelines to support knowledge sharing and platform scalability.

Provided mentorship to junior engineers, promoting best practices in cloud architecture, ETL design, API integration, and data quality management.

Worked directly with product managers and architects to ensure data platform evolution aligned with enterprise data strategy and business growth goals.

Led end-to-end data ingestion from ERP systems, ensuring seamless data migration and validation for budgeting, procurement, and finance domains.

Designed scalable ETL frameworks to automate incremental and full-load data extraction, improving data availability for reporting by 40%.

Implemented entity resolution and record linkage strategies to eliminate redundancy and maintain high-integrity master datasets for critical business entities.

Worked on Snowflake performance tuning, including partitioning strategies, query optimization, and cost-efficient compute resource management.

Established best practices for API consumption, minimizing over-fetching and reducing API-related operational costs through smart query parameterization.

Environment: Development, Testing, Quality Assurance, User Acceptance Testing, Staging, Pre-Production, Production, Disaster Recovery, Sandbox, Performance Testing, Integration, Load Testing, System Integration Testing, Regression Testing, Training, Pilot, Demo, Backup, Hotfix, Blue-Green, Canary, Data Lake Dev, ETL QA, Analytics Sandbox, Machine Learning Dev, Archive, Data Factory, Logic Apps, Azure EventHub, Postman, DBT Cloud, Terraform Jupyter Notebooks, GitHub Actions / Azure DevOps / AWS Code Pipeline (CI/CD), containerization, DBT (Data Build Tool), Docker, Role-based Access Control, Azure Monitor, Terraform, Informatica PowerCenter, spark streaming, data pipeline, terraform, azure DevOps, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi.

Environment: Azure Databricks, Data Factory, Logic Apps, Azure EventHub, YAML, spark streaming, data pipeline, terraform, azure DevOps, yaml, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi.

Role: AWS Data Engineer Aug 2021 – Sep 2022

Client: American Express, Phoenix, AZ

Responsibilities:

Designed and implemented scalable cloud-native data architectures to support large-scale ERP data transformation initiatives across budgeting, financials, and procurement domains.

Built robust ETL pipelines leveraging Azure Data Factory, Python, and SQL, ensuring clean, consistent, and enriched datasets for enterprise reporting.

Developed and maintained RESTful API integrations to ingest third-party and internal data, enabling seamless consolidation of fragmented data sources into unified models.

Led Master Data Management (MDM) efforts by defining canonical data structures, implementing deduplication logic, and establishing data governance policies.

Worked with Snowflake and Databricks platforms for large-scale data processing, transformation, and analytics, optimizing workflows for performance and scalability.

Developed backend services and automation scripts in Python and .NET, supporting ingestion, transformation, and enrichment processes across multiple datasets.

Designed field-level mapping strategies and normalization frameworks to integrate multiple ERP and financial datasets, ensuring consistency and reporting accuracy.

Automated infrastructure provisioning and deployments using Terraform and GitHub Actions, ensuring secure and repeatable CI/CD pipelines for data applications.

Enforced data security by implementing Azure IAM, Key Vault, and role-based access controls, protecting sensitive data and API credentials.

Created detailed monitoring and observability solutions using Azure Monitor and Log Analytics, ensuring real-time data flow transparency and system health visibility.

Supported the evaluation and onboarding of external APIs by analyzing data coverage, update frequency, licensing models, and integration costs.

Conducted API performance benchmarking and implemented best practices to optimize API consumption patterns, reducing response time and operational costs.

Built strategies to standardize, merge, and cleanse multi-source datasets using data deduplication and entity resolution techniques for higher data quality.

Partnered with business stakeholders, architects, and analysts to align technical data solutions with business objectives and ERP migration goals.

Developed internal documentation for data integration frameworks, schema definitions, and ETL pipelines, enabling efficient knowledge transfer and onboarding.

Mentored junior engineers on best practices around data pipeline development, API integration, cloud security, and real-time data validation techniques.

Facilitated sprint planning and retrospectives by translating complex technical data requirements into achievable deliverables and milestones.

Proposed and implemented data quality scoring dashboards to evaluate data reliability, identify inconsistencies, and drive continuous improvements.

Actively contributed to data governance discussions around data retention policies, disaster recovery planning, and secure handling of sensitive business datasets.

Environment: AWS Lambda, Amazon CloudWatch, AWS Glue (ETL), AWS S3, Snowflake, AWS Lambda, Jenkins, Git, GitHub, AWS Code Pipeline, Terraform, Git, JIRA, Apache Kafka, Databricks, Snowflake, Hadoop, Apache Hive, Apache Spark and AWS DMS.

Role: Hadoop Developer June 2019 – Jul 2021

Client: Taylor Technology, Dallas Texas

Responsibilities

Migrated large-scale datasets from Oracle to MySQL using Apache Spark and Scala, while integrating DBT for post-ingestion transformation logic to ensure maintainable, testable, and modular pipelines.

Developed real-time analytics pipelines using Spark Streaming to deliver actionable sales insights on live data streams, enhancing responsiveness and decision-making for business units.

Created and maintained DBT models and tests for transforming and validating raw data into analytics-ready formats, supporting unified reporting across MySQL, Cassandra, and Hadoop systems.

Designed scalable batch and streaming workflows in Spark (RDDs, DataFrames, Spark SQL), integrating Airflow DAGs for orchestration and automated pipeline execution across data platforms.

Built Power BI dashboards on top of curated datasets transformed with DBT, delivering near real-time visual insights for stakeholders using CSVs, Excel, and other semi-structured data inputs.

Engineered end-to-end data pipelines using PySpark and SQL, aligning with modern ELT paradigms where DBT handled downstream transformation and documentation in Snowflake-like ecosystems.

Utilized Sqoop for structured data ingestion into HDFS from MySQL, followed by transformation using HiveQL and further modeling using DBT for unified analytics.

Optimized performance of MapReduce workflows using advanced tuning techniques (combiners, partitioning, distributed cache), increasing throughput for large-volume batch processing.

Applied MapReduce-based algorithms to support classification and categorization tasks on unstructured datasets, integrating outputs into structured tables for downstream DBT workflows.

Authored YAML-based deployment scripts for CI/CD pipeline automation, ensuring consistent deployment of Spark and DBT workflows into dev and prod environments.

Maintained code versioning using Git and GitHub, promoting collaboration, version tracking, and controlled deployment cycles within a team-centric data engineering environment.

Collaborated with business analysts and data scientists to design clean, governed DBT models aligned with organizational KPIs, enabling self-service analytics and centralized metric definitions.

Environment: Hadoop, Hive, spark, PySpark, Sqoop, Spark SQL, Apache Flume, Cassandra, YAML, ETL.

Role: Data warehouse Developer Apr 2018 – May 2019

Client: TMW Systems, Cleveland Ohio

Responsibilities:

Worked as an SQL Server Analyst/Developer/DBA, leveraging SQL Server 2012, 2015, and 2016 to design, develop, and maintain databases supporting business intelligence and data warehousing solutions.

Developed and managed SQL Server jobs, SQL Mail Agent, alerts, and scheduled DTS/SSIS packages, ensuring smooth data integration and transformation across systems.

Utilized Erwin Data Modeler for logical and physical data modeling, maintaining and updating models for Consolidated Data Store (CDS), Actuarial Data Mart (ADM), and Reference Database based on user requirements, ensuring alignment with

business needs and data governance standards.

Managed source control for database development and deployment, using TFS (Team Foundation Server) for tracking environment-specific script deployments, ensuring version consistency and efficient code management.

Exported and published data models from Erwin to SharePoint for user access, providing teams with clear and accurate documentation for ongoing data analysis and decision-making.

Administered and maintained key databases including Consolidated Data Store, Actuarial Data Mart, and Reference Database, ensuring data accuracy, accessibility, and support for analytical processes.

Designed and implemented triggers, stored procedures, functions, and T-SQL code for data transformation, validation, and business logic enforcement, maintaining high standards for performance and data integrity.

Deployed scripts across multiple environments, adhering to configuration management and playbook requirements to ensure consistency in production, staging, and development environments.

Performed query tuning and performance optimization to enhance the efficiency of data retrieval and ensure optimal query execution, improving the overall system performance.

Actively managed defect tracking and resolution through Quality Center, ensuring that issues were documented, tracked, and resolved in a timely manner to meet quality assurance standards.

Maintained user roles and permissions within SQL Server, ensuring data security and compliance with organizational policies and best practices.

Environment: SQL Server 2008/2012 Enterprise Edition, SSRS, SSIS, T-SQL, Windows Server 2003, Performance Point Server 2007, Oracle 10g, Visual Studio 2010.

Role: Data Warehouse Developer May 2013 – Jul 2017

Client: VALUE LABS, Hyderabad, India

Responsibilities:

Designed and developed robust data warehouse solutions supporting healthcare analytics by integrating diverse clinical, billing, and patient data sources using SQL Server Integration Services (SSIS).

Built scalable, high-performance ETL pipelines to enable accurate and timely data delivery for critical use cases such as patient tracking, resource planning, and compliance reporting.

Created dimensional models and healthcare data marts using star and snowflake schemas, enabling streamlined reporting and analytical insights for clinical and administrative stakeholders.

Implemented data governance practices including data validation, metadata documentation, and lineage tracking to support auditability and ensure compliance with healthcare standards (e.g., HIPAA).

Developed complex stored procedures, functions, and indexing strategies in SQL Server to optimize performance of large-scale healthcare datasets, including patient demographics, treatment history, and insurance claims.

Collaborated with business stakeholders, clinicians, and compliance officers to gather requirements and translate them into scalable technical designs aligned with business and regulatory needs.

Spearheaded the development of advanced BI dashboards and

reports using SSRS to visualize patient outcomes, hospital KPIs, and resource utilization, supporting data-driven decision-making across clinical operations.

Delivered OLAP-based analytics solutions using SSAS, enabling healthcare providers to monitor trends like readmission rates, physician productivity, and treatment outcomes.

Ensured secure and governed data handling through role-based

access controls, logging, and HIPAA-compliant ETL designs to protect patient data during migration and reporting.

Established documentation and operational playbooks to support data quality, governance, and knowledge transfer across teams—improving long-term platform maintainability.

Acted as a bridge between business and technical teams, helping translate healthcare-specific challenges into robust architectural patterns and actionable data strategies.

Played a key role in identifying opportunities for performance tuning, improving system response times and enabling scalable access to clinical and administrative reports.

Championed standardization and best practices in data warehousing, metadata management, and error handling to enhance reliability and trust in healthcare data systems.

Supported agile delivery cycles by providing iterative data

models, continuous ETL improvements, and collaboration in backlog grooming and sprint planning activities.

Engaged in early initiatives around data-driven clinical decision support, setting the foundation for future AI and ML applications in population health and predictive analytics.

Played a foundational role in shaping enterprise data architecture strategies for healthcare analytics systems, ensuring long-term scalability, interoperability, and performance across reporting and BI platforms.

Collaborated with cross-functional teams—including data analysts, compliance officers, and BI developers—to design data models that aligned with both clinical workflows and executive reporting needs.

Established and enforced naming conventions, data dictionary standards, and ETL logging frameworks to improve metadata consistency and traceability across healthcare datasets.

Led proof-of-concept initiatives to evaluate new tools and technologies for improving data integration, including early-stage assessment of cloud platforms and AI-readiness of healthcare data.

Designed early-stage patient-level forecasting data structures to support downstream AI use cases such as predictive readmission risk and treatment outcome modeling.

Conducted stakeholder workshops to translate clinical and administrative KPIs into actionable data models and reporting metrics, aligning architecture with strategic healthcare goals.

Contributed to enterprise data stewardship efforts by identifying and resolving data quality issues, establishing governance workflows, and advocating for clean data practices across teams.

Promoted a culture of data literacy by mentoring junior developers and analysts on SQL optimization, dimensional modeling, and healthcare reporting best practices.

Ensured alignment with industry trends and compliance

mandates by staying up to date with data architecture frameworks, healthcare data standards, and privacy regulations.

Assisted in capacity planning and resource estimation for large data processing workloads, helping to optimize infrastructure costs and system performance for growing healthcare datasets.

Environment: MS SQL Server 2016, Visual Studio 2017/2019, SSIS, Share point, MS Access, Team Foundation server, Server Integration Services, Cassandra, MDX Scripting, YAML Git.

Contact this candidate