Nice to know my problem has a name. Meet Extract-Transform-Load.

From ETL for America

Many of the problems governments confront with technology are fundamentally about data integration: taking the disparate data sets living in a variety of locations and formats (SQL Server databases, exports from ancient ERP systems and Excel spreadsheets on people’s desktops, for example) and getting them into a place and shape where they’re actually usable.

Among backend software engineers, these are generically referred to as ETL problems, or extract-transform-load operations.

In the case of court opinions the ETL problem is complicated by the fact that the data that comes from the courts is in PDF format and the courts do little beyond dumping it on websites and declaring it sort of published. I’m going to be taking a long look at the handling of the ETL problem in the other branches of government to see what’s going on there.

