Much of the content that makes our justice system run — court cases, legislative materials, administrative summaries — is actually public information. The fundamental purpose of this information (or in the modern parlance, data) is to educate the public and ensure the fair and transparent administration of justice. New technology is making it easier and less expensive for everyone to access this information, but that doesn’t mean everything is copacetic.
This is the first in a series of three articles discussing the what, how, and why of open legal data.
In this article, we’ll talk a bit more about what this information is and why it’s important. Then we’ll talk about how courts are wrestling with these new questions, including how these issues have ended up at the Supreme Court of the United States. Finally, we’ll talk about how all this applies to practicing lawyers.
Why Does This Matter?
Historically, it was very expensive to produce and disseminate this legal information (for our purposes, we’ll call it legal data). So, access has been limited to those who could physically go to the source, such as a courthouse, those who could afford expensive books (back in the day), or those who could pay a hefty fee to one of the few legal research providers.
Several advances in technology make it much easier and far less expensive for lawyers and the general public alike to access legal data.
First, and most obvious is the internet. While internet-educated clients may often be the butt of the jokes lawyers tell one another, legal information — often provided or even explained by an attorney — is at consumers’ fingertips like never before.
Additionally, with the rise of both cloud computing and increases in computing power, increasingly “intelligent” algorithms can consume and analyze this information in radically new ways. Though we’re probably a long way from robot lawyers, complex data analysis and machine learning can provide powerful and informative insights directly to consumers in fairly compelling ways. Unsurprisingly, these advances in technology are creating novel questions of law.
It comes as no surprise, then, that the Supreme Court is teeing up to answer questions central to this very issue in the upcoming term.
SCOTUS Speaks: Will Open Data Soon Be Mandatory?
Carl Malamud is the president of the innovative Public.Resource.Org and a leading advocate and champion of the movement promoting open data in the law. Malamud is a storied figure in the ongoing fight to secure public access to the law. One of his favorite quotes by Justice Stephen Breyer nicely sums up his mission: “If a law isn’t public, it isn’t a law.”
Malamud’s victory last year in Code Revision Commission v. Public.Resource.Org was a significant win for open data advocates, settling a long-standing dispute over whether the Georgia government could claim copyright protection for the official annotations to its laws. In short, the 11th Circuit deemed the annotations public domain materials and, as such, not copyrightable.
SCOTUS is currently reviewing the case in response to Georgia’s writ of certiorari and heard oral arguments on Dec. 2. While we will likely not hear from the high court until the end of the summer, a few of the justices, such as Justice Neil Gorsuch, showed their hand. He asked Georgia’s attorney this notably pointed question during oral argument: “Why would we allow the official law to be hidden behind a paywall?”
Open Data and Access to Justice
Open data can promote and further the access to justice movement. When data is more freely accessible for lawyers, legal technology companies, and legal aid organizations to use and study, they can work in tandem to create innovative solutions to problems in the legal system.
One example is the simple opportunity for laypeople to educate themselves on the laws that bind them. It’s true that the public may have difficulty interpreting and applying the law as effectively as a lawyer with thousands of hours of training. But still, the more consumers have access to information on the law, the better they’ll understand it. And, with greater information and understanding, they’ll be better able to understand the opportunities and possible outcomes of their legal proceedings.
Further, open access to legal data allows lawyers and legal aid organizations to better gauge the need for legal services in their communities. For example, by allowing legal tech companies the opportunity to freely access court records and develop insights on the number and types of cases occurring within a given jurisdiction, they can uncover where the legal services needs of that area are unmet. In turn, organizations like the Legal Services Corporation, state legislatures and nonprofit law firms can better measure the real levels of funding required to address these needs proactively rather than reactively.
Open Data Is Better for Practicing Lawyers
Lawyers rely on access to accurate legal data to do their work. Over many years and for a wide variety of reasons, private companies have gained control over the distribution of a large portion of legal data. Today, the tools exist to make this publicly owned legal data more freely and broadly available. Still, those who can’t or won’t spend a fortune to reach that data (solo practitioners, small firms, or underfunded legal aid organizations), can’t get unfettered access to address the needs of those they serve. Without open legal data, these legal professionals lack not only the tools to adequately assess the problems in underserved client communities, but they also cannot fully engage with the legal system as intended.
Small firms don’t have the millions of dollars required to download all of the court records controlled by a variety of private providers or held in dozens of government databases like PACER. But if that information is “open,” new technology can “ingest” and, using complex algorithms, analyze volumes upon volumes of legal data — empowering lawyers with valuable insights that they can use to help their clients.
Ultimately, open, unfettered access to legal data can mean less time, money and headaches, bogging firms down and preventing them from reaching more people, fulfilling unmet needs and, consequently, turning a profit.
Charting a Path to Innovation
Legal data belongs to all of us. Advances in technology are beginning to turn legal data into open data and make it more organized, accessible and useful. But until the Supreme Court weighs in, this crucial question of whether legal data will be made open enough to realize its vast potential to empower lawyers and consumers will remain unanswered.
How Automation and AI Make Open Data Possible and Valuable for Lawyers
With significant advancements in automation and artificial intelligence, it’s easier than ever for attorneys to gather and analyze legal data to gain powerful business development and intelligence insights. This article is the second in a three-part series discussing the what, how and why of open legal data.
In the first part, we discussed what open data is and what it can do to improve access to justice and strengthen attorneys’ practices. We also discussed that, while advancements in automation and AI are beginning to turn legal data into open data to make it more useful, ultimately the Supreme Court needs to resolve the unanswered question of whether — and how — myriad forms of legal data will be made openly accessible to spark further changes in legal services delivery models.
Now, we turn to the “how” of open legal data, diving into what it takes to bring legal data into the public sphere in a way that is organized, accessible and useful. We’ll start by exploring why bulk access to legal data is necessary and how automation plays a central role in collecting and aggregating. Then, we look at how normalization helps make legal data more useful, and how lawyers can leverage it for business development and intelligence.
Bulk Access to Legal Data
Real open access to legal data means bulk access. Making legal data meaningful for accurate analysis requires access to more than just a handful of records. It requires access to millions of documents and billions of data points, which can only feasibly be achieved with automation. To illustrate, let’s take an example familiar to most attorneys: accessing court data.
As an individual attorney in a solo practice or small law firm, it’s virtually impossible to manually download all the court filings and raw data points needed to reach the critical mass for establishing patterns and deriving insights from court data. And while a few court systems provide bulk access to the data in their repositories for a fee, almost none provide bulk access to the documents connected to court cases. As such, the only way to gather enough court data for producing viable analysis is through some automated process to pull the data in a systematic fashion.
For a lawyer interested in developing the code to automate gathering the data from just one court portal, it can be a very daunting task, given the need to continually update the code whenever the portal adjusts its search parameters and key fields. But when considering the resources, infrastructure and sophistication needed to obtain data from hundreds of court portals to achieve bulk access, it’s no longer a realistic endeavor for a solo practitioner or small firm to pursue on their own. Luckily, there are now multiple legal technology companies (including, but definitely not limited to, our team at UniCourt) that have automated the process of aggregating legal data in bulk, so attorneys don’t have to.
However, even once we’ve achieved bulk access, we’re not done yet. Automating the process of gathering legal data is only the first step, as it still needs to be cleaned and structured before being made useful for attorneys.
What Is Normalization, and Why Is It Necessary?
Legal data from public resources like court systems is messy and unstandardized. It needs to undergo a process called “normalization” to clean, organize and refine it before it can provide insights for business development and intelligence.
To help explain what normalization is and why it’s necessary, let’s use the example of an attorney wanting to find all of the litigation involving a particular party: State Farm Mutual Automobile Insurance Company. In Florida state courts alone, there are over 9,400 variations of State Farm’s name across different jurisdictions and courthouses, due to different spellings and misspellings, abbreviations, name changes, the inclusion of various suffixes and more. Here are some of the variations that appear throughout Florida case filings:
- STATE FARM FIRE MUTUAL AUTOMOBILE INSURANCE COMPANY
- STATE FARM M;UTUAL AUTO INS CO
- STATAE FARM MUTUAL AUTO INS CO
- STATE ARM MUTUAL AUTO INS CO
- STATE FAARM MUTUAL AUTO INS C
- STATE FAARM MUTUAL AUTO INSURANCE CO
- STATE FAM MUTUAL AUTOMOBILE INSURANCE COMPANY
- STATE FAMR MUTUAL AUTO INS CO
- STATE FAR MUTUAL AUTO INS CO
- STATE FARAM MUTUAL AUTOMOBILE INSURANCE COMPANY
- STATE FARM AUTO MUTUAL INS CO
- STATE FARM MURUAL AUTOMOBILE INSURANCE COMPANY
- STATE FARM MUTUAL AUTO INSURANCE COMPAN
- STATE FARM MUTUAL AUTO INSURANCE COMPANY A FOREIGN CORPORATION
- STATE FARM MUTUAL AUTO INSURANCE COMPANY A/AN FOREIGN CORPORATION
- STATE FARM MUTUAL AUTO INSURANCE COMPANY, A FOREIG
To gather accurate information about litigation involving State Farm, the attorney in our example would need a system or process that identifies these various iterations of the name as the same company. But this is just one company — one party — in Florida state cases. What if the attorney wanted to search for one of the myriad other businesses involved in litigation? Or for a particular judge, law firm or attorney? Or to search for them in a different jurisdiction? This same problem of misspellings, abbreviations, name changes and other variations then begins to exponentially balloon.
This is where normalization comes in. In short, normalization is an AI process of clustering, enriching and connecting name variations across data sets. More specifically, it involves clustering the name variations of different entities such as businesses, attorneys, law firms and judges based on name similarities. Then, those clusters are enriched with other public data sets to map them to real-world entities like State Farm Mutual Automobile Insurance Company, and, finally, those real-world entities are linked together to establish relationships between one another.
With normalized legal data, a user who searches for litigation involving State Farm will get results containing all the versions of the company name. Without normalization, however, it would be infeasible to account for all the variations that exist, resulting in missed business opportunities and faulty business decisions based on incomplete information.
Translating Legal Data Into Business Development and Intelligence
Finding new prospective clients is critical for lawyers, now more than ever. Bulk access to normalized legal data provides a powerful tool to find business opportunities and keep on top of litigation trends. For example, access to data that’s already gone through a refinement process can help you locate the new cases involving existing and previous clients. Similarly, you can use bulk data to uncover other parties who are routinely involved in litigation and pursue them as potential clients. With advancements in automation and AI, any lawyer, whether a solo practitioner or in BigLaw, can develop scalable approaches to finding new leads and opportunities to stay in touch with clients.
You can also use normalized legal data to study historical and current litigation trends and track changes in the volume of litigation affecting your practice. Positioning your firm for success means determining which thriving practice areas to enter, but also avoiding areas that are diminishing. In addition to developing business intelligence for law practice planning, you can use the data to identify gaps in the legal services market and pro bono opportunities. With the ongoing impacts of COVID-19 and likely increases in legal services needs, open legal data has incredible potential to empower lawyers to find new outlets to serve those in need.
The next part in our series will cover some of the companies, nonprofits and initiatives at the intersection of data and access.
Subscribe to Attorney at Work
Get really good ideas every day for your law practice: Subscribe to the Daily Dispatch (it’s free). Follow us on Twitter @attnyatwork.