Scraping the Teknoids Mailman PiperMail Archive

Putting this here in case anyone finds themselves in need of something to scrape a Pipermail web archive of a Mailman mailing list. This bit of Python 3 is based on a a bit of Python 2 I found at Scraping GNU Mailman Pipermail Email List Archives. The only changes I made from the original are to update somethings to work in Python 3. It works well for my purposes, generating a single text file of the teknoids list archive from 2005 to today.

#!/usr/bin/env python

import requests
from lxml import html
import gzip
from io import BytesIO

listname = 'teknoids'
url = 'https://lists.teknoids.net/pipermail/' + listname + '/'

response = requests.get(url)
tree = html.fromstring(response.text)

filenames = tree.xpath('//table/tr/td[3]/a/@href')

def emails_from_filename(filename):
print (filename)
response = requests.get(url + filename)
if filename[-3:] == '.gz':
contents = gzip.GzipFile(fileobj=BytesIO(response.content)).read()
else:
contents = response.content
return contents

contents = [emails_from_filename(filename) for filename in filenames]
contents.reverse()

contents = b"\n\n\n\n".join(contents)

with open(listname + '.txt', 'wb') as filehandle:
filehandle.write(contents)

The best new free music-making software: essential freeware for April 2022 | MusicRadar

“The best new free music-making software: essential freeware for April 2022 | MusicRadar” https://www.musicradar.com/news/best-new-free-music-software-april-2022

My last post to teknoids: He’s dead, Jim. This list is getting a reboot

I just posted to the teknoids list letting everyone know I’m shutting down the list and replacing it with a Discourse forum at https://discourse.teknoids.net/. here’s the text of the post:

As a few of you may have noticed things have been amiss with the list since late last year when Microsoft decided to put the list on some sort of irrevocable ban list. As a result messages are not being delivered to over half of the subscribers at law schools around the country. That’s nearly 300 people. More troubling to me is that virtually no one who stopped receiving messages even appears to have noticed that they aren’t getting messages anymore.

After trying many, many approaches to getting the ban lifted and staring at the apparent ambivalence of the list itself I’ve decided to shut down the list effective immediately. The mailing list has been around for 30 years and I’ve been the admin for many of those, so it wasn’t an easy decision to make. The list will no longer function after 5:00 PM ET today, Monday April 4, 2022.

Of course on the Internet nothing ever really goes away. The list archives will continue to be available. For those of you interested in continuing the conversation, growing the community, or just generally keeping in touch I’m launching a new website called The Teknoids List at https://discourse.teknoids.net/. The new Teknoids List is a Discourse-based discussion forum that is up and running now. I’d like to invite everyone on the list to head on over and create an account. Tell your friends, tell your neighbors, tell your colleagues! With some luck I hope we can grow the new site into the go to place to discuss and discover that latest in tech + legal education.

If you have any questions or concerns please reply to me directly or, better, head on over to https://discourse.teknoids.net/ and we’ll talk them through.

Thanks,
Elmer
Chief Teknoid
https://discourse.teknoids.net/