# Evaluation Framework Use the following steps for a standardized approach to evaluating ML systems: #flashcard - **Business Objectives** - articulate the specific ***business*** value of the solution, such as increasing user engagement, reducing operational costs, improving user satisfaction, mitigating risks, or generating revenue (try to be more specific than these vague examples) - **Product Metrics** - explain the user-facing metrics which will indicate success. Think about what you can measure in the performance of the product or system which will indicate success - **ML Metrics** ([[Evaluation Metrics]]) - detail technical metrics that align with product goals. These should be metrics you can measure in the performance of the ML system without additional inputs. - **Evaluation Methodology** - determine online and offline evaluation approaches. Offline evaluations will often be a proxy for online, but geared towards rapid iteration. - **Address Challenges** - these include things, such as imbalanced data, labelling costs, fairness issues, etc. <!--ID: 1751507777308--> # Evaluating LLMs - [LangChain Evals](https://python.langchain.com/docs/guides/evaluation/) - [Llama Index Evals](https://docs.llamaindex.ai/en/stable/module_guides/evaluating/root.html) - [RAGAS Evals](https://github.com/explodinggradients/ragas) # Products ## Experimentation - [[MLflow]] - [[Comet.ml]] - [[Optimizely]] - [[Split.io]] - [[Athina]] ## Evaluation - [[Opik]] - [[DeepChecks]] - [[Evidently AI]] - [[RAGAS]] - [[TruLens]] - [[Velvet]] - [[DeepEval]] - [[Guardrails AI]]