JayaSree
Mail ID: ************@*****.***
Phone: 502-***-****
https://www.linkedin.com/in/jayasree-k-860975318
SRE Devops Engineer
PROFESSIONAL SUMMARY:
Seasoned and highly skilled Cloud and Site Reliability Engineer with 10+ years of experience in designing, implementing, and managing scalable, secure, and highly available cloud infrastructure across AWS, Azure, GCP, and OpenStack.
Hands-on with automation using Terraform, Ansible, Jenkins, and custom scripts in Python, Groovy, and Shell. Extensive experience with container orchestration platforms (Kubernetes, OpenShift, ECS), and observability tools (Grafana, New Relic, Honeycomb).
Known for delivering scalable, production-grade environments through robust automation and proactive monitoring.
Proven expertise in cloud architecture, infrastructure automation (Terraform, CloudFormation, ARM), and CI/CD pipelines (Jenkins, GitHub, Bitbucket).
Adept at applying DevOps and Agile methodologies to drive operational excellence, reduce downtime, and accelerate deployment cycles.
Extensive experience in implementing Infrastructure as Code (IaC), monitoring complex environments using tools such as Splunk, Honeycomb, ELK, Grafana, AppDynamics, and New Relic, and leveraging containerization and orchestration technologies like Docker, Kubernetes, ECS, and OpenShift to ensure resilience and scalability.
Strong background in managing secure cloud environments with deep knowledge of networking, SSL, IAM, and compliance practices.
Hands-on with SRE practices including SLIs/SLOs/SLAs, incident response, automation of toil, and improving service reliability.
Proficient in using tools like Databricks CLI, REST APIs, and Terraform to automate and manage big data workflows. Experienced in writing Groovy, Python, Shell, YAML, and PowerShell scripts to support full lifecycle automation and system integrations.
A collaborative and communicative professional with a strong commitment to continuous improvement, operational efficiency, and delivering high-quality solutions.
Technical Skills:
Operating Systems
RHEL/CentOS, Ubuntu/Debian/Fedora, Windows server 2012/2016/2019
Languages and Scripting
YAML, Python, Ruby, Shell, Perl, HTML, Power shell, C, C++, Java
Database
MongoDB, Oracle DB, MySQL, AWS RDS
Infrastructure as a service
OpenStack, AWS, Azure, VMware, Terraform (IAC)
Containerization
Docker, Kubernetes, OpenShift
Configuration management
Ansible, Capistrano, Rundeck, Chef, Puppet, ansible tower
CI, Test & Build Systems
Jenkins Pipelines, Concourse, Maven, Gradle, Ant,
Application/Web Servers
Tomcat, JBoss, Apache, IBM WebSphere, IBM HTTP server
Logging & Monitoring Tools
Honeycomb, Splunk, AppDynamics, DataDog, Zabbix, Prometheus
Version Control Tools
GIT, SVN, Bitbucket
Security Tools
Dome9, Twistlock, Rapid7, Tripwire, Snort
PROFESSIONAL EXPERIENCE
Client: JPM Chase - Chicago, IL Jan 2023 – Present
Role: SRE DevOps Engineer
Responsibilities:
Lead Build and deployment, SRE support for cloud backend applications.
Used Bash and Python, included Boto3 to supplement automation provided by Ansible and Terraform for tasks such as
Encrypting EBS volumes, backing AMIs and scheduling Lambda functions for routine AWS tasks.
Wrote python scripts to manage AWS resources from API calls using BOTO SDK and worked with AWS CLI.
Implemented AWS Lambda functions to run scripts in response to events in Amazon Dynamo DB table or S3 bucket or to HTTP requests using Amazon API gateway.
Installed and configured AWS Inspector. Created targets and templates and scheduled assessment runs on all EC2 instances in the AWS account.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required. Integrated Kafka with Spark in sand box Environment.
Integration of Application with monitoring tool New Relic for complete insight and proactive monitoring.
Experience in using APM Configuration -New Relic to track the changes across CI/CD pipeline and infrastructure. Support and maintain log analytics frameworks via ELK (Elasticsearch, Logstash, Kibana)
Worked on monitoring servers using Nagios, Cloud watch and using ELK Stack Elasticsearch Fluentd Kibana.
Established infrastructure and service monitoring using Synthetic monitoring - Prometheus, Grafana, Elasticsearch, Kibana and fluentd, CloudWatch for logging and monitoring.
Integrated Datadog cloud monitoring tool with PagerDuty for triggering real time alerts to POCs.
Created APM Configuration -Data dog dashboards for various applications and monitored Realtime and historical metrics.
Created system alerts using various Data dog tools and alerted application teams based on the escalation matrix.
Installed and configured Splunk as monitoring tool for disk usage, CPU utilization, list of open files etc. And sending alerts to teams based on same.
Reporting windows event logs from remote machines to Splunk using Splunk universal forwarder.
Responsible on application Datadog checking metrics, creating SLO and SLI.
Provided 24x7 production support and development environments. Ability to communicate requirements effectively to team members and manage applications.
Client: Cisco - Remote Dec 2021—Dec 2022
Role: SRE/ Cloud Engineer
Responsibilities:
Engineered and maintained robust, production-grade release environments across QA, UAT, and production stages, ensuring reliability, consistency, and seamless integration throughout the software delivery lifecycle.
Monitored and analyzed Service Level Indicators (SLIs) using Dynatrace and Honeycomb; collaborated with DBAs and developers to optimize SQL performance and meet Service-Level Objectives (SLOs).
Established and enforced SRE best practices across the engineering org through internal workshops and process documentation.
Led onsite coordination and owned production deployments, incident response, and support, ensuring zero-downtime rollouts and rapid recovery during outages.
Monitored distributed systems and microservices health using observability stacks like Prometheus, Grafana, ELK, and custom telemetry solutions.
Configured Kubernetes for container orchestration, enabling horizontal scaling, load balancing, and environment segmentation through namespace strategies.
Architected and deployed scalable cloud-native solutions using Google Cloud Platform (GCP) services including GKE, BigQuery, Bigtable, and Pub/Sub, with a focus on cost-efficiency and performance.
Built and maintained end-to-end data pipelines using BigQuery and Bigtable for large-scale ingestion, transformation, and analytics workloads.
Developed real-time dashboards using Looker Studio, enabling actionable insights and data-driven decision-making for technical and business stakeholders.
Conducted incident triage using Splunk, Dynatrace, and APM tools; led root cause analysis and postmortems to reduce Mean Time to Recovery (MTTR).
Implemented proactive infrastructure monitoring using Nagios and Zabbix, configuring alerts and dashboards for critical servers and network events.
Supported GCP workload migrations by designing and executing zero-downtime pre/post-migration validation strategies.
Extended on-premise infrastructure to AWS using AWS Storage Gateway and S3; configured IAM policies, logging, versioning, and lifecycle rules for optimized storage.
Automated provisioning and bootstrap of AWS EC2 instances (Linux, RHEL, Windows) using AMIs and shell scripts; resolved performance issues related to memory, CPU, and network usage.
Designed and maintained Dynatrace dashboards and alerting configurations, customized to internal customer requirements for real-time performance tracking.
Executed performance testing using LoadRunner, designing load and stress test scenarios to validate application scalability under peak conditions.
Created custom SLO dashboards in AppDynamics, gathering baseline metrics and enabling proactive service performance management.
Managed infrastructure projects with complex networking and storage requirements; implemented secure and scalable environments aligned with organizational needs.
Client: Accenture Solution Pvt Ltd – Hyderabad, India pril 2015—July 2021
Role: SRE /Cloud/DevOps Automation Cloud Engineer
Responsibilities:
Worked with architects, developers, QA and cloud development team to implement cloud applications and automate processes to reduce toil using DevOps automation tools.
Worked on converting the traditional applications to docker and automate the build and deploy process for faster deployment and reduced the deployment time by 80%.
Worked with various services of AWS: EC2, ECS, ELB, Route53, S3, CloudFront, SNS, RDS, IAM, Lambda, CloudWatch, and Cloud Formation.
Worked on OpenStack services such as Horizon, Keystone, Nova, Neutron, Glance, Cinder, Ceilometer and Swift.
Experienced in Automating, Configuring and Deploying the Instances on Azure environments and in Data centers also designing ARM templates. Azure Resources like ASE, App Service Plan, App Services, Application Gateway, API Management, Event Hub, Azure Service Bus, App Insights, Key Vault, SQL Managed Instance, Storage Account, Virtual Machines, Subnets, Virtual Networks.
Created a new Azure Active Directory (Azure AD) application and service principal that can be used with the role-based access control access to Azure Stack resources
Used Infrastructure as Code tools Terraform, ARM templates and CloudFormation for provisioning the Cloud infrastructure.
Developed and implemented Software Release Management strategies for various applications according to the agile process.
Used Google’s SRE (site reliability engineer) culture in maintaining the reliable infrastructure and following key elements SLIs, SLOs, SLAs. Perform post-mortems with teams after every roll back or deployment failure with precise documentation and constantly improving the process from previous failures. Following metrics of MTTR (Mean Time To Rollback, Respond, Resolve, Recovery), Mean Time To Mitigate, Mean Time To Acknowledge.
Good understanding of OpenShift platform in managing Docker containers and Kubernetes Clusters.
Worked on breaking up the monolithic applications to microservice and using Jenkins pipelines deploy the microservices applications to Docker registry and then to Kubernetes.
Authored Terraform modules for infrastructure management and published module to the Terraform registry to deploy production cloud environment.
Used Power shell to write, debug scripts to automate the processes and do migration of VM, involving copying and creating of VHDS. Configuration of Microsoft DevTest Labs to migrate the virtual machines from one subscription to another subscription.
Built a VPC, established the site-to-site VPN connection between Datacentre & AWS.
Used Dome9 to detect misconfigurations, model and actively enforce security best practices, and protect against identity theft and data loss in the cloud.
Worked with Hashi corp vault to store the credentials. Integrated with CI/CD pipeline to manage the credentials for build and deployment jobs.
Used tripwire and snort intrusion detection system (IDS) and configure email alerts on production Servers for security to detect and protect from the outside threats.
Utilized cloud-based APIs when appropriate to write network/system level tools for securing cloud environments.
Written Docker files to package and build docker containers. Used Multistage docker files to build maven code in one container and deploy and test on another container.
Deployed Java microservice applications Using Docker containers, Kubernetes and OpenShift. configuring autoscaling, replica sets etc using Yaml files.
Worked on implementing the Docker using the docker maven plugin for wrapping up final code by building Docker images, setting up development and testing environment using Docker Hub, Docker Compose, Docker Swarm and Docker Container Network.
Worked with Twistlock to perform scans for docker container and check vulnerabilities.
Worked on writing automation scripts using shell scripting, PowerShell, Yaml, Groovy, Python, Ruby using IntelliJ.
Used RPM maven plugin to package springboot java application JARs with dependencies and convert to RPM for easy deploy on servers.
Experience with Ansible, Capistrano, Chef and Puppet for configuration management.
Developed Ansible playbooks, inventories and custom playbooks in YAML and encrypted the secrets using Ansible Vault and maintained role-based access control by using Ansible Tower. Implemented IT orchestration using Ansible to run tasks on different servers. Used vault for managing the secrets.
Worked on Setting up CI tool Jenkins master slave cluster from the scratch with high availability and configure plugins, settings and Single sign on (SSO)for authentication with matrix.
Used Git as source code management tool: creating local repo, cloning the repo, adding, committing, pushing the changes in the local repo, saving changes for later (Stash), recovering files, cherry-pick, branching, creating tags, viewing logs, etc. and responsible for Access management for GitHub enterprise Organizations.
Environment: AWS (IAM, EC2, ECS, ECR, ELB, S3, EBS, Kinesis, VPC, Route53, RDS, DynamoDB, SQS, SNS, Lambda, Cloud formation) Openstack, Azure, Linux, Windows, Terraform, Kubernetes, Docker, OpenShift, Ansible, Capistrano, Chef, Git, Jenkins Pipelines, vault, JBOSS Maven, Nexus, JFrog Artifactory, Splunk, Op5, AppDynamics, Datadog, Yaml, Groovy, Python, Java, Kafka, Redis, SonarQube, Tripwire, MongoDB, OracleDB, GCP.
Client: Green Chain Software Solutions - Hyderabad, India Jan 2012 – Dec 2014 Role: Build & Release Engineer
Responsibilities:
Managed releases, Environment Management, Deployments, Continuous integration, Continuous deployment, Incident management, Version management.
Installed configured upgraded software packages for Linux and Solaris Servers using RHN, and sun update manager.
Analysed and monitored resource utilization, and system performance using various system tools such as vmstat, sar etc.
Wrote shell scripts for monitoring the systems and applications such as monitoring the processes in all the servers, and also run cron jobs using crontab.
Managed volume and file system using ZFS on Solaris and LVM in Linux.
Installed and configured Apache Web server, WebLogic Application Server, and Oracle database in the servers.
Configured domains, admin and managed servers in WebLogic application server to deploy web/enterprise applications.
Ability to handle load balancer implementations like bonding multiple interfaces into single bond in case of over load on LAN devices.
Strong understanding in writing the automation of processes using the shell script with bash
Administering local and remote servers on daily basis, troubleshooting and correcting errors.
Continuous Integration and Continuous deployment also used Build automation tools like Jenkins, Ant, Maven.
Experience in using MAVEN and ANT for building of deployable artifacts (jar, war & ear) from source code.
Created builds using shell Scripts, ANT/Maven scripts manually and automated. Also worked on design and maintenance of the GIT Repositories
Environment: CentOS & Ubuntu, Solaris 9/10/11, VMware, Java/J2ee, GIT, ANT, Maven, Nexus, Jenkins, Python, Apache Tomcat, KickStart, WebSphere, SQL, Agile, UNIX & Perl scripts, Jira, Shell scripts, Apache, Bash, JBoss Application Server, Subversion
Education Details:
Masters in Computer Science Campbellsville University - Louisville, Kentucky
Bachelors in Information Technology and Engineering JNT University - India