Why pre-deployment testing is not an adequate framework for AI risk management