# Overview [[From Buzz to Building - Introduction to GenAI for Developers - Part 1 - Key Concepts]] [[From Buzz to Building - Introduction to GenAI for Developers - Part 2 - The Technical Stack]] # Key Considerations # Implementation Details ### Alter Model Input or Input Interpretation One of the pillars of machine learning is "garbage in, garbage out" and GenAI is no different. Altering the input data into the model, can make a huge impact to the model's performance. ![[From Buzz to Business - Understanding GenAI for Builders 2024-12-29 05.52.55.excalidraw.svg]] %%[[From Buzz to Business - Understanding GenAI for Builders 2024-12-29 05.52.55.excalidraw|🖋 Edit in Excalidraw]]%% 1. **[[tokenization]]**: the process of splitting text data into tokens (e.g., a sentence into individual words). Individual words may be split even further. Also, punctuation often receives individual special tokens to help know when sentences start or end or when a question is being asked. There are different strategies for tokenization, including [[Word-based Tokenizers]], [[Character-based Tokenizers]], and [[Subword Tokenizers]]. 2. **Attention**: many experts believe that the discovery of the impact of attention on language models is one of the foundational moments that helped us enter this era of GenAI. An [[attention mechanism]] is what impacts how the context of each word in your input sentence will be interpreted. In the above example, the tokens "like" and "ing" have more impact on "train" than words earlier in the sentence. 3. **Prompt Engineering**: a prompt is the user input provided to the model that then results in the model generating a response. The process of [[Prompt Engineering]] is refining that prompt to get better and better results. This is typically the "lowest hanging fruit" when it comes to improving model outputs. The world is constantly discovering new and interesting ways to interact with the models. 4. **[[Embedding Models]]** and [[Retrieval Augmented Generation (RAG)]]: a RAG architecture helps improve response and protect against [[hallucinations]] (i.e., when a model states something as fact when it is wrong, or completely made up). With RAG, the model has an additional set of reference data (e.g., company documents, a textbook). This data is turned into a numerical representation via the embedding models and stored as vectors in [[vector databases]]. Through mathematical computations, the prompt and embeddings can be compared to determine the documents most similar and relevant to the prompt. When generating a response, the model can be told to cite directly or double check against the references. This helps ground each response in truth and relevance. ### Alter the Model Beyond the foundational architecture of the model, there are many slight changes you can make to tweak a fully-trained model, or change how it may behave. These enhancements can be used to improve the model, or to make it more practical to use (i.e., make it smaller or less compute intensive). ![[From Buzz to Business - Understanding GenAI for Builders 2024-12-29 06.42.32.excalidraw.svg]] %%[[From Buzz to Business - Understanding GenAI for Builders 2024-12-29 06.42.32.excalidraw|🖋 Edit in Excalidraw]]%% 1. **Parameters**: you may have noticed that the full name of many LLMs include a term like `80B` or `7B`. This term is quantifying the number of parameters in the model. The parameters actually are the single most important aspect of what a model "is". They represent the final numeric values created after training the model (i.e., they are the result of the millions of dollars of compute costs doing iterative math calculations). These numeric values are used to produce all future outputs based on the user input. Generally, the larger the model, the better the model. But, this comes with a tradeoff on compute costs and memory usage. 2. **Fine-tuning**: while these parameters are amazing for general purposes, if you're building a specialized application with GenAI you'll want to make sure the model is tuned for your needs. This is done via [[fine-tuning a model]]. You start with a general purpose model, which provides an amazing starting point and saves you millions of dollars in initial training. Then, you feed it data specific to your use case, run additional training, and then you have slightly updated data for your needs! Strategies for fine-tuning include [[LoRA]], [[QLoRA]], [[PEFT]], and [[Direct Preference Optimization (DPO)]]. 3. **Quantization**: All of these parameters result in models being very large from a memory standpoint, which is why many model providers offer alternatives with fewer parameters that may be more practical for something like a mobile application. One way to further reduce the size of a model is to use [[Quantization]], which is the process of mapping a set of data that takes up a large amount of memory to a set of data that takes up a smaller set of memory. For example, the parameters of the model may be represented using `float32` values that take up 32 bits each. Quantization would map these values into a `float16` or `int8` representation, which take up less space. Precision and, therefore accuracy, would be lost, but the model would take up significantly less space. 4. **Temperature**: Finally, you can alter the results of the model by tweaking the [[model temperature]]. Earlier, we mentioned how the next word of a model response is selected based on which has the highest probability of being the correct one to use next. The temperature of the model influences the model to select words of lower probabilities. This can result in more "creativity" in the model, but it also can cause the model to have [[hallucinations]]. 5. [[Model Distillation]] ### Alter Model Interactions Lastly, you can use your refined models and interact with them creatively to get different results. One example of this picking up buzz in the GenAI community is using the models as [[AI Agents]]. The models themselves aren't heavily altered in this case, but they are given some "special" capabilities using carefully crafted prompts. - Prompts for agents should not prescribe an approach to apply. Rather, the prompts should give the models some general goals and a persona (e.g., teacher, student), but the approach should be determined by the model using reasoning and reflection. You can provide examples of good results, or test cases, though. - Agents should be given access to tools to allow them to complete their goals. A tool is a function the model can call. This may be an API to check the weather, or access to writing an updated order to a database. - Optionally, agents can interact with other model agents. This division of responsibility and improve performance by having specialized agents working on certain tasks or having some agents "check" the work of other agents. ![[From Buzz to Business - Understanding GenAI for Builders 2024-12-26 07.02.19.excalidraw.svg]] %%[[From Buzz to Business - Understanding GenAI for Builders 2024-12-26 07.02.19.excalidraw|🖋 Edit in Excalidraw]]%% In the example depicted above, we could have an AI agent with a principal persona help lead a group of teachers in creating curriculum that will lead to the highest overall school grade. The teachers then create individual curriculums for their subjects based on tools like converting words to a relevant SAT synonym. Each curriculum can then be tested against students that may have different needs (e.g., a learning disability) to ensure it meets the needs of a general population. All the models in this example could be specially trained and fine-tuned for their purpose. # Useful Links # Related Topics ## Reference - Learning rubric ([Reddit - Dive into anything](https://www.reddit.com/r/datascience/comments/1c1mmtg/how_to_formally_learn_gen_ai_kindly_suggest/kz4wlp8/)): - Learn about the [[attention mechanism]]. (No need to deep dive. Just understand what it does). - [[Transformer Models]] vs [[Recurrent Neural Networks (RNNs)]] vs [[Long Short-Term Memory (LSTM) Networks]] (Again a brief overview should suffice). - Different types of LLMs based on transformers. [[Encoder-Decoder Transformer Models]], [[Decoder-only Transformer Models]], etc. Just skim through what types of architectures are popular LLMs such as GPT 3.5/4, Llama2, Mistral 7B or 8x7B based on. - Open Source vs Closed Source LLMs: Which ones are better at the moment? Different companies involved in the LLM rat race such as OpenAI, Google DeepMind, Mistral, Anthropic, etc. How to access these? For open source explore platforms such as [[Hugging Face]] and [[Ollama]]. - [[Prompt Engineering]]: Get comfortable with writing prompts. I would suggest Andrew NGs short course on prompt engineering to understand methods such as few shot learning. - Learn about each of these: What is [[tokenization]]? What are Vector Embeddings and what are some popular embedding model available today? Why do we need [[Vector Databases]] such as FAISS, Pinecone or ChromaDB etc? What does context length of an LLM mean? - What is [[Quantization]] of LLM weights? Difference between 4-bit, 8-bit, 16-bit LLMs. - [[Retrieval Augmented Generation (RAG)]] or RAG: Understand how training data used for LLMs might not have all the info you need, RAG allows you to perform question answering on your personal documents. At this point, you might want to explore frameworks such as Langchain and LlamaIndex. These provide one stop solution for all GenAI related requirements of your application. - Finetuning LLMs: Why do we need to finetune LLMs? How is it different from RAG? How much GPU memory/VRAM would I need to finetune a small LLM such as Llama2? Techniques such as LoRA, QLoRA, PEFT, DPO etc. Finetuning an LLM would require some understanding of frameworks such as Pytorch or tensorflow. - Advanced features such as [[AI Agents]], Tool use, Funtion calling, Multimodal LLMs, etc. - Access various opensource models such from ollama or huggingface. Also get familiarized with using OpenAI’s API. - I would also suggest try to work with [[Streamlit]]. It’s a very convenient way of creating a frontend for your application. #### Working Notes #### Sources - [Day 1 Livestream with Paige Bailey – 5-Day Gen AI Intensive Course | Kaggle - YouTube](https://www.youtube.com/watch?v=kpRyiJUUFxY) - [Practical Deep Learning for Coders - 1: Getting started](https://course.fast.ai/Lessons/lesson1.html) - ww.reddit.com/r/ChatGPTCoding/comments/1ggp3az/just_tried_coding_with_metas_llama_32_3b_as_a/?share_id=jhp-b0lhnto9EW61S6Ufp) - [Hugging Face – The AI community building the future.](https://huggingface.co/)