Job Description:
Role and Responsibilities:
- Gathering and analyzing data to root out errors, discern trends, and work with the product team to solve platform/infrastructure issues/problems.
- Responding to incidents, but more importantly, preventing incidents through proactive analysis and monitoring, including sanity checks.
- Identify and communicate the need for process improvements and recommend possible product issue resolution.
- Collaborate with the product team to diagnose and deliver solutions to improve the availability, scalability, latency, and efficiency of the product.
- Leading/coaching a group of new hires to execute day-to-day operations.
Skills and Qualifications:
- Experience as a DevOps Engineer/SRE or similar software engineering role.
- Coding and Automation: Knowledge in Shell/Unix Scripting, GitHub Actions for automation.
- Problem-Solving: Helps recognize problems, devise, and execute solutions.
- Agile Methodology.
- Hands-on experience with:
- Source control tools (Git).
- CI/CD Tools - Jenkins.
- Data Storage Tools (MySQL, Oracle).
- Monitoring tools like Log Aggregation (Loginsight/Mezmo).
- Application Performance Monitoring tools like Dynatrace.
- Service Management (Incident, Change, Problem, and Event/Alert Management) - ITIL v3/v4.
- Data Analysis (having knowledge in Capability Maturity Model Integration (CMMI) is a plus).
- Skills (good to have):
- Monitoring and Observation Tools (Like LogInsight, Moogsoft, Thousand Eyes, Azure Monitor).
- Application Performance Monitoring Tools like Dynatrace.
- Working with Azure DevOps.
- Continuous Testing (Selenium).
- Coding: Java.
- Web Application Server: Tomcat, Websphere App Server (WAS).
- Cloud/Multicloud: Azure, AWS.
- Schedule Preferably: Rotation (irregular schedule).
#J-18808-Ljbffr