Mountpoint for Amazon S3 is an open source file client that makes it easy for your file-aware Linux applications to connect directly to Amazon Simple Storage Service (Amazon S3) buckets. Announced earlier this year as an alpha release, it is now generally available and ready for production use on your large-scale read-heavy applications: data lakes, machine learning training, image rendering, autonomous vehicle simulation, ETL, and more. It supports file-based workloads that perform sequential and random reads, sequential (append only) writes, and that don’t need full POSIX semantics.
— Mountpoint for Amazon S3 – Generally Available and Ready for Production Workloads
AudioCraft is a simple framework that generates high-quality, realistic audio and music from text-based user inputs after training on raw audio signals as opposed to MIDI or piano rolls.
Source: AudioCraft: A simple one-stop shop for audio modeling
Enhance your academic content’s credibility and make navigation of your articles easier on WordPress with these essential plugins.
Source: Meet Academic Standards With These Essential WordPress Plugins For Scholarly Content
Goal: better, more focused search for www.cali.org.
In general the plan is to scrape the site to a vector database, enable embeddings of the vector db in Llama 2, provide API endpoints to search/find things.
Hints and pointers.
- Llama2-webui – Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
- FastAPI – web framework for building APIs with Python 3.7+ based on standard Python type hints
- Danswer – Ask Questions in natural language and get Answers backed by private sources. It makes use of
- PostgreSQL – a powerful, open source object-relational database system
- QDrant – Vector Database for the next generation of AI applications.
- Typesense – a modern, privacy-friendly, open source search engine built from the ground up using cutting-edge search algorithms, that take advantage of the latest advances in hardware capabilities.
The challenge is to wire together these technologies and then figure out how to get it to play nice with Drupal. One possibility is just to build this with an API and then use the API to interact with Drupal. That approach also offers the possibility of allowing the membership to interact with the API too.
Here’s a great quick start guide to getting Jupyter Notebook and Lab up and running with the Miniconda environment in WSL2 running Ubuntu. When you’re finished walking through the steps you’ll have a great data science space up and running on your Windows machine.
I am going to explain how to configure Windows 10 and Miniconda to work with Notebooks using WSL2
Source: Configuring Jupyter Notebook in Windows Subsystem Linux (WSL2) | by Cristian Saavedra Desmoineaux | Towards Data Science
The longer holiday weekend edition.
- Opportunities and Risks of LLMs for Scalable Deliberation with Polis — Polis is a platform that leverages machine intelligence to scale up deliberative processes. In this paper, we explore the opportunities and risks associated with applying Large Language Models (LLMs) towards challenges with facilitating, moderating and summarizing the results of Polis engagements.
- How I Use PandasAI to Complete 10 Most Frequent Tasks in Data Science —
A Quick Introduction and Development Guide For Pandas AI
- Introduction to Haystack — Haystack is an open-source framework for building search systems that work intelligently over large document collections. Learn more about Haystack and how it works.
- Master Semantic Search at Scale: Index Millions of Documents with Lightning-Fast Inference Times using FAISS and Sentence Transformers — Dive into an end-to-end demo of a high-performance semantic search engine leveraging GPU acceleration, efficient indexing techniques, and robust sentence encoders on datasets up to 1M documents, achieving 50 ms inference times
- Natural Language to SQL using an Open Source LLM
- Leveraging LangChain, Pinecone, and LLMs for Document Question Answering: An Integrated Approach — Document Question Answering (DQA) is a crucial task in Natural Language Processing(NLP), aiming to develop automated systems capable of understanding and extracting relevant information from textual documents to answer user queries. With recent advancements in Large Language Models (LLMs) like ChatGPT and innovative tools and technologies such as LangChain and Pinecone, a new integrated approach to DQA has emerged.
- LlamaIndex: the ultimate LLM framework for indexing and retrieval — LlamaIndex, previously known as the GPT Index, is a remarkable data framework aimed at helping you build applications with LLMs by providing essential tools that facilitate data ingestion, structuring, retrieval, and integration with various application frameworks.