An Open Legal Taxonomy: It Just Starts With A List of Words

Lately I’ve been spending a fair amount of time thinking about taxonomies (and probably ontologies too, but I leave that distinction for another day) and their application to the various CALI and free law projects I work on. CALI maintains its own taxonomy for describing legal education materials. We call it the CALI Topic Grids or just Topics. Originally developed as away to guide our authors as they created CALI Lessons, the Topics were intended to identify what a professor wants to teach today. The level of specificity of a Topic is what is the point of a law to be covered in a particular classroom session. Like so many of our other resources the Topics are created by faculty teaching in the area covered by the Topics.

Now reaching beyond Lessons we use the Topics to describe podcasts, blog posts, crossword puzzles, chapters and sections of books, and most recently, court opinions. Like any good taxonomy Topics not only provide a useful way to describe the contents of a resource but also provide a useful finding aid. Indeed the Topics are best expressed as an outline, instantly recognizable to law students and faculty. The Topics serve as the headings for the outline with various resources gathered beneath, see for example http://topics.cali.org/contracts/.

As the free law movement in the US grows one of the most pressing questions that arises is how to categorize and describe the immense body of law. Simple full text searching and basic gathering of meta data about the law is easy enough to accomplish, but all that doesn’t tell us what the law is about. How do I know if a court opinion deals with the formation of a contract or some obscure point of criminal procedure? The short answer is that unless you are using a very large commercial legal data service that includes the use of topics and headnotes in its products you don’t know what the opinion is really about without reading it. Sure you may have a clue from the search that turned up the document, but that isn’t really a lot of reliable data. You need to have that opinion tagged with a known taxonomy.

Applying a taxonomy to law seems like a daunting task, but not as impossible as it once was. Once upon a time the idea of applying a taxonomy to the law in the US was pretty much a non-starter because you couldn’t get access to the law you wanted to categorize. The good news is that we’ve gotten past much of that. We now have access to sizable portions of the law in the US, at least enough to begin applying a taxonomy. Which leads us to the question of the taxonomy itself. How do we do that?

The worlds of taxonomy and ontology (again, I know the 2 are different, but I’m lumping them together here for arguments sake) are awash in a sea of acronyms and competing standards. Most of that stuff is really about the application of a taxonomy or ontology in a given situation, more about the “how” of describing things. That isn’t the main problem. The main issue is words. At their base taxonomies or ontologies are just lists of words. Carefully chosen, domain specific words, but still a list of words. And once you have the words, then you can apply them as you wish.

The creation of a list of words intended to describe the law has been done. Some lists are proprietary and unavailable to the free law movement. Other lists may be too general, more for describing broader collections not individual resources. There is one list, the CALI Topics, that describes specific points of law in individual resources. I would recommend using the CALI Topics as a starting point for creating an open legal taxonomy.

The CALI Topics are not an exhaustive list but with 41 top level topics and 14 published full Topic Grids they are a good start. The Topics can be expanded to include more top level areas of the law and complete Topic Grids can be added to make the Topics more comprehensive. Because the Topics exist as just lists of words they can be adapted to just abut any taxonomy/ontology framework/specification.

By using an existing taxonomy as a base, the free law movement can save a considerable amount of time and effort in getting started on the task of describing the law. The resources saved by adopting an existing taxonomy can then be applied to really hard problem of actually figuring out how to apply specific terms to a given resource. I have some ideas for that too, but I leave those for another post.

Setting Up Apache Solr 4.2 and Drupal 7 For Better Search

Solr is an open source search server based on Apache Lucene. Lucene provides Java-based indexing and a search library, and Solr extends it to provide a variety of APIs and search functionality, including faceted search and hit highlighting, and handles Word and PDF document searching. It also provides caching and replication, making it scalable, robust, and very fast.
Happily, Solr also plays nicely with Drupal, the popular CMS platform. If you want fast and effective search on your Drupal site, installing Solr is a straightforward way of getting it quickly. Until this month, the Apachesolr Drupal module didnt support the current Solr 4.x schemas, but as of the very latest version of the Apachesolr module, 7.x-1.2, you can now set up Solr 4.x on your Drupal 7 site. This tutorial assumes that youre running Drupal 7.22 the most up-to-date version under Apache on a Linux box.

via How to set up Solr 4.2 on Drupal 7 with Apache.

If you running Drupal and have a lot of nodes to index and you’re not using Solr you’re missing out on a lot. Though it takes a bit of config to set up, using Solr to index and search your Drupal site is much better than the stock Drupal search.