AudioCraft is a simple framework that generates high-quality, realistic audio and music from text-based user inputs after training on raw audio signals as opposed to MIDI or piano rolls.
Goal: better, more focused search for www.cali.org.
In general the plan is to scrape the site to a vector database, enable embeddings of the vector db in Llama 2, provide API endpoints to search/find things.
Hints and pointers.
- Llama2-webui – Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
- FastAPI – web framework for building APIs with Python 3.7+ based on standard Python type hints
- Danswer – Ask Questions in natural language and get Answers backed by private sources. It makes use of
- PostgreSQL – a powerful, open source object-relational database system
- QDrant – Vector Database for the next generation of AI applications.
- Typesense – a modern, privacy-friendly, open source search engine built from the ground up using cutting-edge search algorithms, that take advantage of the latest advances in hardware capabilities.
The challenge is to wire together these technologies and then figure out how to get it to play nice with Drupal. One possibility is just to build this with an API and then use the API to interact with Drupal. That approach also offers the possibility of allowing the membership to interact with the API too.
In the world of data, textual data stands out as being particularly complex. It doesn’t fall into neat rows and columns like numerical data does. As a side project, I’m in the process of developing my own personal AI assistant. The objective is to use the data within my notes and documents to answer my questions. The important benefit is all data processing will occure locally on my computer, ensuring that no documents are uploaded to the cloud, and my documents will remain private.Demystifying Text Data with the unstructured Python Library — https://saeedesmaili.com/demystifying-text-data-with-the-unstructured-python-library/
To handle such unstructured data, I’ve found the unstructured Python library to be extremely useful. It’s a flexible tool that works with various document formats, including Markdown, , XML, and HTML documents.
What I’m reading today.
- Researchers from Peking University Introduce ChatLaw: An Open-Source Legal Large Language Model with Integrated External Knowledge Bases — This includes links to the article and Github repo
- Why Embeddings Usually Outperform TF-IDF: Exploring the Power of NLP
- Fine-tune an LLM on your personal data: create a “The Lord of the Rings” storyteller
- Open Assistant — In the same way that Stable Diffusion helped the world make art and images in new ways, we want to improve the world by providing amazing conversational AI. Github repo
The longer holiday weekend edition.
- Opportunities and Risks of LLMs for Scalable Deliberation with Polis — Polis is a platform that leverages machine intelligence to scale up deliberative processes. In this paper, we explore the opportunities and risks associated with applying Large Language Models (LLMs) towards challenges with facilitating, moderating and summarizing the results of Polis engagements.
- How I Use PandasAI to Complete 10 Most Frequent Tasks in Data Science —
A Quick Introduction and Development Guide For Pandas AI
- Introduction to Haystack — Haystack is an open-source framework for building search systems that work intelligently over large document collections. Learn more about Haystack and how it works.
- Master Semantic Search at Scale: Index Millions of Documents with Lightning-Fast Inference Times using FAISS and Sentence Transformers — Dive into an end-to-end demo of a high-performance semantic search engine leveraging GPU acceleration, efficient indexing techniques, and robust sentence encoders on datasets up to 1M documents, achieving 50 ms inference times
- Natural Language to SQL using an Open Source LLM
- Leveraging LangChain, Pinecone, and LLMs for Document Question Answering: An Integrated Approach — Document Question Answering (DQA) is a crucial task in Natural Language Processing(NLP), aiming to develop automated systems capable of understanding and extracting relevant information from textual documents to answer user queries. With recent advancements in Large Language Models (LLMs) like ChatGPT and innovative tools and technologies such as LangChain and Pinecone, a new integrated approach to DQA has emerged.
- LlamaIndex: the ultimate LLM framework for indexing and retrieval — LlamaIndex, previously known as the GPT Index, is a remarkable data framework aimed at helping you build applications with LLMs by providing essential tools that facilitate data ingestion, structuring, retrieval, and integration with various application frameworks.
What I’m reading today.
- Semantic Search with Few Lines of Code — Use the sentence transformers library to implement a semantic search engine in minutes
- Choosing the Right Embedding Model: A Guide for LLM Applications — Optimizing LLM Applications with Vector Embeddings, affordable alternatives to OpenAI’s API and how we move from LlamaIndex to Langchain
- Making a Production LLM Prompt for Text-to-SQL Translation
What I’m reading today.
- How Unstructured and LlamaIndex can help bring the power of LLM’s to your own data
- All You Need to Know to Build Your First LLM App — A Step-by-Step Tutorial to Document Loaders, Embeddings, Vector Stores and Prompt Templates
- Answering Questions about any kind of Documents using Langchain (Not GPT3/GPT4) — Unlocking the Power of Langchain: A Comprehensive Python Guide to Answer Questions about Your Documents from Local Files, URLs, YouTube Videos, and Websites
- Build A Capable Machine For LLM and AI — Build A Dual GPUs PC for Machine Learning and AI with Minimum cost
- LlamaIndex: How to use Index correctly.
- Building a Question-Answer Bot With Langchain, Vicuna, and Sentence Transformers — A Q/A bot with open source
Large language models (LLM) are notoriously huge and expensive to work with. An LLM requires a lot of specialized hardware to train and manipulate. We’ve seen efforts to transform and quantize the models that result in smaller footprints and models that run more readily on commodity software but at the cost of performance. Now we’re seeing efforts to make the models smaller but still perform as well as the full model.
This paper, A Simple and Effective Pruning Approach for Large Language Models, introduces us to Wanda (Pruning by Weights and activations). Here’s the synopsis:
As their size increases, Large Languages Models (LLMs) are natural candidates for network pruning methods: approaches that drop a subset of network weights while striving to preserve performance. Existing methods, however, require either retraining, which is rarely affordable for billion-scale LLMs, or solving a weight reconstruction problem reliant on second-order information, which may also be computationally expensive. In this paper, we introduce a novel, straightforward yet effective pruning method, termed Wanda (Pruning by Weights and activations), designed to induce sparsity in pretrained LLMs. Motivated by the recent observation of emergent large magnitude features in LLMs, our approach prune weights with the smallest magnitudes multiplied by the corresponding input activations, on a per-output basis. Notably, Wanda requires no retraining or weight update, and the pruned LLM can be used as is. We conduct a thorough evaluation of our method on LLaMA across various language benchmarks. Wanda significantly outperforms the established baseline of magnitude pruning and competes favorably against recent methods involving intensive weight update. Code is available at this https URL.
As noted the code behind that paper is readily available on Github at https://github.com/locuslab/wanda for everyone to try.
I think these advances in working with large language models are going to make it more economical for us to host our models and incorporate various NLP and deep learning techniques into our work.
One can use this notebook to build a pipeline to parse and extract data from OCRed PDF files. Warning: When using LLMs for entity extraction, be sure to perform extensive quality control. They are very susceptible to distracting language (latching on to text that sound “kind of like” what you’re looking for) and missing language (making up content to fill any holes), and importantly, they do NOT provide any hints to when they may be erroring. You need to make sure random audits are part of your workflow! Below we’ve worked out a workflow using regular expressions and LLMs to parse data from zoning board orders, but the process is generalizable.https://github.com/colarusso/entity_extraction/blob/main/PDF%20Entity%20Extraction%20with%20Regex%20and%20LLMs.ipynb
Collect a set of PDFs
Place OCRed PDFs into the data folder
Write regexes to pull out data
Write LLM prompts to pull out data
A Jupyter notebook to extract data from PDFs. Useful stuff
LLMs promise to fundamentally change how we use AI across all industries. However, actually serving these models is challenging and can be surprisingly slow even on expensive hardware. Today we are excited to introduce vLLM, an open-source library for fast LLM inference and serving. vLLM utilizes PagedAttention, our new attention algorithm that effectively manages attention keys and values. vLLM equipped with PagedAttention redefines the new state of the art in LLM serving: it delivers up to 24x higher throughput than HuggingFace Transformers, without requiring any model architecture changes.— https://vllm.ai/
vLLM has been developed at UC Berkeley and deployed at Chatbot Arena and Vicuna Demo for the past two months. It is the core technology that makes LLM serving affordable even for a small research team like LMSYS with limited compute resources. Try out vLLM now with a single command at our GitHub repository.