Service Reliability Engineers (Mid/Senior)
Service Reliability Engineer (Mid / Senior)
Location: Kochi, Kerala (Work from Office)
Department: Engineering & Risk Operations
Role Summary
We are seeking a Mid- to Senior-level Service Reliability Engineer (SRE) to ensure the reliability, availability, performance, and security of our fintech platform. This role combines service reliability engineering with fraud detection and risk monitoring, requiring close collaboration with engineering, risk, and operations teams.
The ideal candidate will have 3+ years of experience in a mission-critical fintech or payments environment and will play a key role in safeguarding both system integrity and customer trust.
Key Responsibilities
Service Reliability Engineering:
Ensure high availability, scalability, and performance of customer-facing services and payment systems.
Design, implement, and maintain monitoring, observability, and alerting solutions using Datadog, PagerDuty, and internal dashboards.
Monitor system health through logs, metrics, and traces; troubleshoot infrastructure, application, database, and API issues.
Conduct root cause analysis and lead post-incident reviews.
Define, track, and continuously improve SLIs, SLOs, and SLAs.
Develop and maintain automation scripts and tooling for system provisioning, health checks, and failover processes.
Collaborate with development and infrastructure teams to build resilient, fault-tolerant system architectures.
Fraud Detection and Risk Monitoring:
Monitor real-time transaction activity using Splunk, Datadog, and internal data sources to identify suspicious patterns.
Investigate alerts, anomalies, and behavioral signals to detect fraud, abuse, or financial risk.
Fine-tune detection rules, alerts, and risk indicators based on emerging fraud trends.
Work closely with compliance, fraud operations, and customer support teams on investigations and case resolution.
Develop and maintain dashboards and reports for fraud KPIs, incident metrics, and operational performance.
Participate in cross-functional fraud response initiatives and post-incident analysis.
Required Qualifications
Minimum 3 years of experience in SRE, DevOps, Security Operations, or Fraud/Risk Monitoring roles.
Strong hands-on experience with Datadog, PagerDuty, and Splunk.
Proficiency in Linux, shell scripting, and cloud infrastructure troubleshooting (AWS, GCP, or Azure).
Solid understanding of service observability, incident response, and CI/CD pipelines.
Experience working with fraud detection or risk monitoring systems in fintech, payments, or transaction-heavy platforms.
Strong querying skills using SQL and Splunk SPL, with experience handling large-scale log and event data.
Excellent communication and collaboration skills, with the ability to work across technical and non-technical teams.
Preferred Qualifications
Exposure to KYC/AML systems or regulated fraud detection environments.
Familiarity with secure API design, authentication mechanisms (OAuth 2.0, JWT), and access control models.
Knowledge of compliance standards such as PCI-DSS, ISO 27001, ISO 8583.
Experience with rule engines or anomaly detection techniques.
Relevant certifications such as AWS DevOps Engineer, CFE (Certified Fraud Examiner), or equivalent.