Goal: better, more focused search for www.cali.org.
In general the plan is to scrape the site to a vector database, enable embeddings of the vector db in Llama 2, provide API endpoints to search/find things.
Hints and pointers.
- Llama2-webui – Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
- FastAPI – web framework for building APIs with Python 3.7+ based on standard Python type hints
- Danswer – Ask Questions in natural language and get Answers backed by private sources. It makes use of
- PostgreSQL – a powerful, open source object-relational database system
- QDrant – Vector Database for the next generation of AI applications.
- Typesense – a modern, privacy-friendly, open source search engine built from the ground up using cutting-edge search algorithms, that take advantage of the latest advances in hardware capabilities.
The challenge is to wire together these technologies and then figure out how to get it to play nice with Drupal. One possibility is just to build this with an API and then use the API to interact with Drupal. That approach also offers the possibility of allowing the membership to interact with the API too.
Here’s a great quick start guide to getting Jupyter Notebook and Lab up and running with the Miniconda environment in WSL2 running Ubuntu. When you’re finished walking through the steps you’ll have a great data science space up and running on your Windows machine.
I am going to explain how to configure Windows 10 and Miniconda to work with Notebooks using WSL2
Source: Configuring Jupyter Notebook in Windows Subsystem Linux (WSL2) | by Cristian Saavedra Desmoineaux | Towards Data Science
Unlock Your Second Brain with Streamlit and Hugging Face’s Free LLM Summarization: build a Python Webapp running on your PC.
Source: Mastering AI Summarization: Your Ultimate Productivity Hack
This uses a smaller language model tailored to text summarization. Maybe a good path for assessing student short answers and essays.
Developers can now fine-tune GPT-3 on their own data, creating a custom version tailored to their application. Customizing makes GPT-3 reliable for a wider variety of use cases and makes running the model cheaper and faster.
You can use an existing dataset of virtually any shape and size, or incrementally add data based on user feedback. With fine-tuning, one API customer was able to increase correct outputs from 83% to 95%. By adding new data from their product each week, another reduced error rates by 50%.
Source: Customizing GPT-3 for Your Application
Putting this here in case anyone finds themselves in need of something to scrape a Pipermail web archive of a Mailman mailing list. This bit of Python 3 is based on a a bit of Python 2 I found at Scraping GNU Mailman Pipermail Email List Archives. The only changes I made from the original are to update somethings to work in Python 3. It works well for my purposes, generating a single text file of the teknoids list archive from 2005 to today.
from lxml import html
from io import BytesIO
listname = 'teknoids'
url = 'https://lists.teknoids.net/pipermail/' + listname + '/'
response = requests.get(url)
tree = html.fromstring(response.text)
filenames = tree.xpath('//table/tr/td/a/@href')
response = requests.get(url + filename)
if filename[-3:] == '.gz':
contents = gzip.GzipFile(fileobj=BytesIO(response.content)).read()
contents = response.content
contents = [emails_from_filename(filename) for filename in filenames]
contents = b"\n\n\n\n".join(contents)
with open(listname + '.txt', 'wb') as filehandle:
KNN (K-Nearest Neighbors) is Dead! | by Marie Stephen Leo | Towards AI | Dec, 2020 | Medium https://medium.com/towards-artificial-intelligence/knn-k-nearest-neighbors-is-dead-fc16507eb3e
Learning how to apply some of the algorithms mentioned in this article would likely improve students’ and teachers’ ability to locate CALI resources and allow us to build a useful recommender system.
Introduction to web scraping with Python – Data, what now? https://datawhatnow.com/introduction-web-scraping-python/
From self-driving cars to stock market predictions to online learning, machine learning is used in almost every field that utilizes prediction as a way to improve itself. Due to its practical usage, it is one of the most in-demand skills right now in the job market. Also, getting started with Python and machine learning is easy as there are plenty of online resources and lots of Python machine learning libraries available.
Source: Get started with machine learning using Python | Opensource.com