Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Adapted Semantic Search

“At Free Law Project, we believe in transparency and sharing our innovations. Today we’re excited to announce our latest development in semantic search: our embedding generation tool and the underlying machine learning model that we will be using in our new semantic search engine. While semantic search might sound complex, we’ve focused on making it intuitive and accessible. In the coming weeks we will be releasing semantic search first as an API and then as a new feature of CourtListener. As we develop these features, we are making our technology publicly available. This approach gives you visibility into our process and allows us to better adapt to your specific needs. In this post, we walk you through our technical approach, introduce you to our microservice for generating embeddings, named “Inception”, and we announce our very own finetuned model for semantic search.

  • Why semantic search? For legal professionals, keyword search has been the gold standard since the dawn of computerized legal research. However, traditional keyword search often falls short when legal concepts appear in varied terminology across cases. This is where semantic search becomes transformative. By looking beyond exact matches to understand the meaning and intent of a query, semantic search uncovers relevant precedents that keyword searches might miss. This is a powerful tool for both experienced attorneys and legal novices alike, but we are particularly excited at the impact this will make on self-represented litigants, who often are not trained in particular legalese.
  • Encoder models for semantic search – While Large Language Models (LLMs) dominate recent headlines, encoder models efficiently transform text into dense vector representations that capture semantic relationships. Unlike LLMs which generate text in a conversational manner, encoder models specialize in converting text into numerical vectors that preserve meaning, making them faster and more efficient for semantic search applications. Encoder models are available from a number of sources, but using these models out-of-the-box presents key limitations:

Sorry, comments are closed for this post.