Configuring Jupyter Notebook in Windows Subsystem Linux (WSL2) | by Cristian Saavedra Desmoineaux | Towards Data Science

Here’s a great quick start guide to getting Jupyter Notebook and Lab up and running with the Miniconda environment in WSL2 running Ubuntu. When you’re finished walking through the steps you’ll have a great data science space up and running on your Windows machine.

I am going to explain how to configure Windows 10 and Miniconda to work with Notebooks using WSL2

Source: Configuring Jupyter Notebook in Windows Subsystem Linux (WSL2) | by Cristian Saavedra Desmoineaux | Towards Data Science

Colarusso: Sample Notebook for Extracting Data from OCRed PDFs Using Regex and LLMs

One can use this notebook to build a pipeline to parse and extract data from OCRed PDF files. Warning: When using LLMs for entity extraction, be sure to perform extensive quality control. They are very susceptible to distracting language (latching on to text that sound “kind of like” what you’re looking for) and missing language (making up content to fill any holes), and importantly, they do NOT provide any hints to when they may be erroring. You need to make sure random audits are part of your workflow! Below we’ve worked out a workflow using regular expressions and LLMs to parse data from zoning board orders, but the process is generalizable.

Collect a set of PDFs
Place OCRed PDFs into the data folder
Write regexes to pull out data
Write LLM prompts to pull out data

https://github.com/colarusso/entity_extraction/blob/main/PDF%20Entity%20Extraction%20with%20Regex%20and%20LLMs.ipynb

A Jupyter notebook to extract data from PDFs. Useful stuff