Building an Evaluation Harness for Production AI Agents: A 12-Metric Framework From 100+ Deployments towardsdatascience.com