Cloud Systems Architect (Remote)
| Location | Kampala, Uganda |
| Date Posted | June 14, 2026 |
| Category | IT / Information Technology |
| Job Type | Full-time |
| Currency | UGX |
Description

Role Overview:
We are hiring for one of our clients, seeking a Site Reliability Engineer (LInE) to work on a contractor basis. As a Site Reliability Engineer, you will apply your expertise to help train next-generation AI systems, shaping how models learn, reason, and perform through high-quality, real-world input. This role offers a unique opportunity to contribute to the development of frontier AI models, leveraging your domain knowledge to drive innovation in the AI industry.
Key Responsibilities:
• Design, implement, and maintain scalable infrastructure using Linux, Kubernetes, and Prometheus, ensuring seamless deployments and high system availability.
• Monitor system health, analyze performance metrics, and proactively address bottlenecks or potential failures, minimizing manual intervention and increasing system reliability.
• Automate operational processes to minimize manual intervention and increase system reliability, and respond swiftly to incidents, conduct root cause analysis, and drive continuous improvements in incident response procedures.
• Collaborate closely with development and operations teams to deliver seamless deployments and high system availability, creating comprehensive documentation and clear runbooks for operational excellence.
• Respond to incidents, conduct root cause analysis, and drive continuous improvements in incident response procedures, ensuring high system availability and minimizing downtime.
Required Skills & Qualifications:
• Proven experience designing, implementing, and maintaining scalable infrastructure using Linux, Kubernetes, and Prometheus, with a strong understanding of system health monitoring and performance metrics analysis.
• Strong understanding of automation tools and technologies, with experience in automating operational processes to minimize manual intervention and increase system reliability.
• Excellent problem-solving skills, with the ability to analyze complex system issues, identify root causes, and develop effective solutions.
• Strong communication and collaboration skills, with the ability to work closely with development and operations teams to deliver seamless deployments and high system availability.
• Experience with comprehensive documentation and clear runbooks for operational excellence, with a strong attention to detail and ability to create clear, concise documentation.
More About the Opportunity:
This role offers a unique opportunity to work with a global leader in the AI industry, leveraging your domain knowledge to drive innovation and shape the development of next-generation AI systems. You will have the opportunity to work on a global scale, collaborating with top experts and contributing to the creation of cutting-edge AI models.
Equal Opportunity Employer:
We hire based on skills and expertise. All qualified candidates are welcome regardless of background, experience, or prior employment history. Applications are reviewed solely on demonstrated technical ability and qualifications.
