Spark-native LLM evaluation framework with confidence intervals, significance testing, and Databricks integration - bassrehab/spark-llm-eval