Introduction to machine-learned ranking in Apache Solr | Opensource.com

Learn how to train a machine learning model to rank documents retrieved in the Solr enterprise search platform.

https://opensource.com/article/17/11/learning-rank-apache-solr

Looks like I may need to blow dust off the millions of opinions I’ve got sitting in that laptop up in my office.

Overview of the Elastic Stack, open source software tools for data insights | Opensource.com

A look at the Elastic stack, a versatile collection of open source software tools that make gathering insights from data easier. Learn the capabilities, requirements, and interesting use cases that apply to each.

Source: Overview of the Elastic Stack, open source software tools for data insights | Opensource.com

Full-Text Search Algorithm in Javascript | Burak Kanber’s Blog

Full-Text Search Algorithm in Javascript | Burak Kanber’s Blog http://burakkanber.com/blog/machine-learning-full-text-search-in-javascript-relevance-scoring/

Setting Up Apache Solr 4.2 and Drupal 7 For Better Search

Solr is an open source search server based on Apache Lucene. Lucene provides Java-based indexing and a search library, and Solr extends it to provide a variety of APIs and search functionality, including faceted search and hit highlighting, and handles Word and PDF document searching. It also provides caching and replication, making it scalable, robust, and very fast.
Happily, Solr also plays nicely with Drupal, the popular CMS platform. If you want fast and effective search on your Drupal site, installing Solr is a straightforward way of getting it quickly. Until this month, the Apachesolr Drupal module didnt support the current Solr 4.x schemas, but as of the very latest version of the Apachesolr module, 7.x-1.2, you can now set up Solr 4.x on your Drupal 7 site. This tutorial assumes that youre running Drupal 7.22 the most up-to-date version under Apache on a Linux box.

via How to set up Solr 4.2 on Drupal 7 with Apache.

If you running Drupal and have a lot of nodes to index and you’re not using Solr you’re missing out on a lot. Though it takes a bit of config to set up, using Solr to index and search your Drupal site is much better than the stock Drupal search.

 

Turbocharging Solr Index Replication with BitTorrent « Code as Craft

Brilliant use of BitTorrent to solve a difficult problem.

CourtListener.com – US Fed Appellate Court Alerts and Yet Another Legal Search Engine

A mention in the BeSpecific blog tipped me off to an interesting project called CourtListener.com. From the about page:

The goal of the site is to create a free and competitive real time alert tool for the U.S. judicial system.

At present, the site has daily information regarding all precedential opinions issued by the 13 federal circuit courts and the Supreme Court of the United States. Each day, we also have the non-precedential opinions from all of the Circuit courts except the D.C. Circuit. This means that by 5:10pm PST, the database will be updated with the opinions of the day, with custom alerts going out shortly thereafter.

The site was created by Michael Lissner as a Masters thesis project at UC Berkley School of Information.

A quick perusal of the site and its associated documents tells us that Michael is using a scraping technique to visit court websites looking for recently released opinions. Once found, the opinions are retrieved, converted from PDF to text, indexed, and stored. Atom RSS feeds are then generated to provide current alerts.

The site is powered by Python using the Django web framework and is open source, so you can download the code. The backend database is MySQL and search is handled by Sphinx. The conversion from PDF appears to be plain text. If you register on the site you can create custom alerts based on saved searches.

All in all CourtListener.com provides another good source for current Federal appellate court opinions. Be sure to check the coverage page to see how far back the site goes for each court. Perhaps the future will bring an expansion to more courts and jurisdictions.

Spam Choked Google Presents Opportunity for New Search Technology

But it turns out that you can’t easily do such searches in Google any more. Google has become a jungle: a tropical paradise for spammers and marketers. Almost every search takes you to websites that want you to click on links that make them money, or to sponsored sites that make Google money. There’s no way to do a meaningful chronological search.

Why We Desperately Need a New (and Better) Google.

Article highlights the failings of Google when it comes to finding plain old information. If your just looking for information Google may not be your best bet. I mean how many times is a random ad going to ask me if I want to buy “Drupal API load_node()”?

Wadhwa suggests an alternative in Blekko, a search tool that lets you use “slashtags” to refine your own searches. Indeed alternate tools for finding information are beginning to appear and perhaps the threat of competition will move Google to clean up its spam ridden indexes.

Robert Douglass on Solr and other Search Back Ends For Drupal

Apache Solr is a powerful and flexible mechanism for performing site search on a Drupal site. Join us as we talk with Robert Douglass about all things Solr in Drupal, including new features and functionality and future development plans. Also, as a bonus, you will hear Robert use the word “de-baconify” in the context of Solr and Drupal.

Acquia Podcast 16: Robert Douglass on Apache Solr and other Search Back Ends Acquia.

This podcast covers mush that is going on with Drupal, Solr, and search in general. Lots of good, current information.