About you
You are an engineer passionate about building reliable, observable, and secure AI systems at scale. You enjoy working at the intersection of product metrics, evaluation frameworks, and production-grade infrastructure. You are comfortable translating abstract KPIs into measurable signals, designing deployment safeguards, and ensuring AI systems behave safely, efficiently, and consistently in real-world environments.
You thrive in fast-paced, cloud-native environments, value automation and rigor, and enjoy collaborating with product, engineering, and AI teams to continuously improve system quality.
You bring to Applaudo the following competencies:
- Bachelor’s Degree in Computer Science, Software Engineering, Computer Engineering, or related field — or equivalent professional experience.
- Strong experience with AI/ML evaluation, including metric definition, evaluation pipelines, golden datasets, and automated judge systems.
- Proficiency in observability and monitoring, including structured logging, tracing, and OpenTelemetry.
- Solid background in CI/CD automation and modern deployment strategies (canary, blue-green, gated deployments).
- Knowledge of AI safety practices, including PII scrubbing, deterministic guardrails, and secure handling of model inputs and outputs.
- Experience working with multi-agent systems and translating product KPIs into measurable agent performance metrics.
- Hands-on experience with AWS, including CDK, ECS/ECR, WAF, SES, Bedrock, CloudWatch, and DevOps Guru.
- Strong experience with Docker, Kubernetes, and cloud-native tooling.
- Familiarity with Azure for identity management and basic exposure to GCP environments.
- Strong analytical thinking, attention to detail, and problem-solving skills.
- Excellent communication skills to collaborate across product, platform, and AI teams.
- English proficiency required for collaboration with global stakeholders.
You will be accountable for the following responsibilities:
- Translate product KPIs into measurable agent and system-level metrics for effectiveness, efficiency, robustness, and safety.
- Design and implement end-to-end observability using structured logging, metrics, and tracing with OpenTelemetry.
- Curate and maintain golden datasets and manage judge systems for scalable, repeatable AI evaluation.
- Implement evaluation-gated deployments within CI/CD pipelines.
- Orchestrate pre-merge and post-merge validation workflows to ensure quality before release.
- Apply canary and blue-green deployment strategies, enabling fast and safe rollbacks.
- Enforce layered security controls, including PII scrubbing, deterministic guardrails, and AI-based filtering of inputs and outputs.
- Monitor and analyze latency, error rates, token usage, and cost metrics across AI systems.
- Track production quality indicators such as correctness, relevance, and helpfulness.
- Convert failures, incidents, and negative feedback into automated regression tests.
- Manage multi-agent interoperability and coordination across AI components.
- Continuously update and adapt guardrails and safety controls as new risks and threats emerge.