vicuna – <CONTENT /> v.6

What I’m reading today.

How Unstructured and LlamaIndex can help bring the power of LLM’s to your own data
All You Need to Know to Build Your First LLM App — A Step-by-Step Tutorial to Document Loaders, Embeddings, Vector Stores and Prompt Templates
Answering Questions about any kind of Documents using Langchain (Not GPT3/GPT4) — Unlocking the Power of Langchain: A Comprehensive Python Guide to Answer Questions about Your Documents from Local Files, URLs, YouTube Videos, and Websites
Build A Capable Machine For LLM and AI — Build A Dual GPUs PC for Machine Learning and AI with Minimum cost
LlamaIndex: How to use Index correctly.
Building a Question-Answer Bot With Langchain, Vicuna, and Sentence Transformers — A Q/A bot with open source

LLMs promise to fundamentally change how we use AI across all industries. However, actually serving these models is challenging and can be surprisingly slow even on expensive hardware. Today we are excited to introduce vLLM, an open-source library for fast LLM inference and serving. vLLM utilizes PagedAttention, our new attention algorithm that effectively manages attention keys and values. vLLM equipped with PagedAttention redefines the new state of the art in LLM serving: it delivers up to 24x higher throughput than HuggingFace Transformers, without requiring any model architecture changes.

vLLM has been developed at UC Berkeley and deployed at Chatbot Arena and Vicuna Demo for the past two months. It is the core technology that makes LLM serving affordable even for a small research team like LMSYS with limited compute resources. Try out vLLM now with a single command at our GitHub repository.
— https://vllm.ai/

Photo of the day

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tag: vicuna

AI Reading List 6/27/2023

vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention