Learning Something New: Understanding Long Short Term Memory Networks

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!

Source: Understanding LSTM Networks — colah’s blog

I wonder what would happen if one trained an LSTM network with a couple million court opinions plus code and regulations. Would it be able to answer even a simple legal question?

A Text Analysis API To Take For A Spin

AYLIEN Text API is a package consisting of eight different Natural Language Processing, Information Retrieval and Machine Learning APIs that will help developers extract meaning and insight from documents.

There are currently 8 endpoints available:

  • Article Extraction: Extracts the main body of article, including embedded media such as images & videos from an URL and removes all the surrounding clutter.
  • Article Summarization: Summarizes an article into a few key sentences.
  • Classification: Classifies a piece of text according to IPTC NewsCode standard into more than 500 categories.
  • Entity Extraction: Extracts named entities (people, organizations, products and locations) and values (URLs, emails, telephone numbers, currency amounts and percentages) mentioned in a body of text.
  • Concept Extraction: Extracts named entities mentioned in a document, disambiguates and cross-links them to DBPedia and Linked Data entities, along with their semantic types (including DBPedia and schema.org types).
  • Language Detection: Detects the main language a document is written in and returns it in ISO 639-1 format, from among 76 different languages.
  • Sentiment Analysis: Detects sentiment of a document in terms of polarity (positive or negative) and subjectivity (subjective or objective).
  • Hashtag Suggestion: Automatically suggests hashtags for better discoverability of content on Social Media.

via Text Analysis API Documentation | AYLIEN.

This might be interesting here when used in conjunction with something like the Free Law Reporter though my initial testing seems to bring uneven results. The API did good work with a copyright case, spotting key phrases and generating a good summary. It didn’t handle Brown v. Board of Education as well, missing key concepts and generating a useless summary. It seems to work better at extracting short newsy articles from cluttered web pages than analyzing lengthy text articles.