# Evaluation Framework
Use the following steps for a standardized approach to evaluating ML systems: #flashcard
- **Business Objectives** - articulate the specific ***business*** value of the solution, such as increasing user engagement, reducing operational costs, improving user satisfaction, mitigating risks, or generating revenue (try to be more specific than these vague examples)
- **Product Metrics** - explain the user-facing metrics which will indicate success. Think about what you can measure in the performance of the product or system which will indicate success
- **ML Metrics** ([[Evaluation Metrics]]) - detail technical metrics that align with product goals. These should be metrics you can measure in the performance of the ML system without additional inputs.
- **Evaluation Methodology** - determine online and offline evaluation approaches. Offline evaluations will often be a proxy for online, but geared towards rapid iteration.
- **Address Challenges** - these include things, such as imbalanced data, labelling costs, fairness issues, etc.
<!--ID: 1751507777308-->
# Evaluating LLMs
- [LangChain Evals](https://python.langchain.com/docs/guides/evaluation/)
- [Llama Index Evals](https://docs.llamaindex.ai/en/stable/module_guides/evaluating/root.html)
- [RAGAS Evals](https://github.com/explodinggradients/ragas)
# Products
## Experimentation
- [[MLflow]]
- [[Comet.ml]]
- [[Optimizely]]
- [[Split.io]]
- [[Athina]]
## Evaluation
- [[Opik]]
- [[DeepChecks]]
- [[Evidently AI]]
- [[RAGAS]]
- [[TruLens]]
- [[Velvet]]
- [[DeepEval]]
- [[Guardrails AI]]