With significant advancements in automation and artificial intelligence, it’s easier than ever for attorneys to gather and analyze legal data to gain powerful business development and intelligence insights. This article is the second in a three-part series discussing the what, how and why of open legal data.
In the first part, we discussed what open data is and what it can do to improve access to justice and strengthen attorneys’ practices. We also discussed that, while advancements in automation and AI are beginning to turn legal data into open data to make it more useful, ultimately the Supreme Court needs to resolve the unanswered question of whether — and how — myriad forms of legal data will be made openly accessible to spark further changes in legal services delivery models.
Now, we turn to the “how” of open legal data, diving into what it takes to bring legal data into the public sphere in a way that is organized, accessible and useful. We’ll start by exploring why bulk access to legal data is necessary and how automation plays a central role in collecting and aggregating. Then, we look at how normalization helps make legal data more useful, and how lawyers can leverage it for business development and intelligence.
Bulk Access to Legal Data
Real open access to legal data means bulk access. Making legal data meaningful for accurate analysis requires access to more than just a handful of records. It requires access to millions of documents and billions of data points, which can only feasibly be achieved with automation. To illustrate, let’s take an example familiar to most attorneys: accessing court data.
As an individual attorney in a solo practice or small law firm, it’s virtually impossible to manually download all the court filings and raw data points needed to reach the critical mass for establishing patterns and deriving insights from court data. And while a few court systems provide bulk access to the data in their repositories for a fee, almost none provide bulk access to the documents connected to court cases. As such, the only way to gather enough court data for producing viable analysis is through some automated process to pull the data in a systematic fashion.
For a lawyer interested in developing the code to automate gathering the data from just one court portal, it can be a very daunting task, given the need to continually update the code whenever the portal adjusts its search parameters and key fields. But when considering the resources, infrastructure and sophistication needed to obtain data from hundreds of court portals to achieve bulk access, it’s no longer a realistic endeavor for a solo practitioner or small firm to pursue on their own. Luckily, there are now multiple legal technology companies (including, but definitely not limited to, our team at UniCourt) that have automated the process of aggregating legal data in bulk, so attorneys don’t have to.
However, even once we’ve achieved bulk access, we’re not done yet. Automating the process of gathering legal data is only the first step, as it still needs to be cleaned and structured before being made useful for attorneys.
What Is Normalization, and Why Is It Necessary?
Legal data from public resources like court systems is messy and unstandardized. It needs to undergo a process called “normalization” to clean, organize and refine it before it can provide insights for business development and intelligence.
To help explain what normalization is and why it’s necessary, let’s use the example of an attorney wanting to find all of the litigation involving a particular party: State Farm Mutual Automobile Insurance Company. In Florida state courts alone, there are over 9,400 variations of State Farm’s name across different jurisdictions and courthouses, due to different spellings and misspellings, abbreviations, name changes, the inclusion of various suffixes and more. Here are some of the variations that appear throughout Florida case filings:
- STATE FARM FIRE MUTUAL AUTOMOBILE INSURANCE COMPANY
- STATE FARM M;UTUAL AUTO INS CO
- STATAE FARM MUTUAL AUTO INS CO
- STATE ARM MUTUAL AUTO INS CO
- STATE FAARM MUTUAL AUTO INS C
- STATE FAARM MUTUAL AUTO INSURANCE CO
- STATE FAM MUTUAL AUTOMOBILE INSURANCE COMPANY
- STATE FAMR MUTUAL AUTO INS CO
- STATE FAR MUTUAL AUTO INS CO
- STATE FARAM MUTUAL AUTOMOBILE INSURANCE COMPANY
- STATE FARM AUTO MUTUAL INS CO
- STATE FARM MURUAL AUTOMOBILE INSURANCE COMPANY
- STATE FARM MUTUAL AUTO INSURANCE COMPAN
- STATE FARM MUTUAL AUTO INSURANCE COMPANY A FOREIGN CORPORATION
- STATE FARM MUTUAL AUTO INSURANCE COMPANY A/AN FOREIGN CORPORATION
- STATE FARM MUTUAL AUTO INSURANCE COMPANY, A FOREIG
To gather accurate information about litigation involving State Farm, the attorney in our example would need a system or process that identifies these various iterations of the name as the same company. But this is just one company — one party — in Florida state cases. What if the attorney wanted to search for one of the myriad other businesses involved in litigation? Or for a particular judge, law firm or attorney? Or to search for them in a different jurisdiction? This same problem of misspellings, abbreviations, name changes and other variations then begins to exponentially balloon.
This is where normalization comes in. In short, normalization is an AI process of clustering, enriching and connecting name variations across data sets. More specifically, it involves clustering the name variations of different entities such as businesses, attorneys, law firms and judges based on name similarities. Then, those clusters are enriched with other public data sets to map them to real-world entities like State Farm Mutual Automobile Insurance Company, and, finally, those real-world entities are linked together to establish relationships between one another.
With normalized legal data, a user who searches for litigation involving State Farm will get results containing all the versions of the company name. Without normalization, however, it would be infeasible to account for all the variations that exist, resulting in missed business opportunities and faulty business decisions based on incomplete information.
Translating Legal Data Into Business Development and Intelligence
Finding new prospective clients is critical for lawyers, now more than ever. Bulk access to normalized legal data provides a powerful tool to find business opportunities and keep on top of litigation trends. For example, access to data that’s already gone through a refinement process can help you locate the new cases involving existing and previous clients. Similarly, you can use bulk data to uncover other parties who are routinely involved in litigation and pursue them as potential clients. With advancements in automation and AI, any lawyer, whether a solo practitioner or in BigLaw, can develop scalable approaches to finding new leads and opportunities to stay in touch with clients.
You can also use normalized legal data to study historical and current litigation trends and track changes in the volume of litigation affecting your practice. Positioning your firm for success means determining which thriving practice areas to enter, but also avoiding areas that are diminishing. In addition to developing business intelligence for law practice planning, you can use the data to identify gaps in the legal services market and pro bono opportunities. With the ongoing impacts of COVID-19 and likely increases in legal services needs, open legal data has incredible potential to empower lawyers to find new outlets to serve those in need.
The next part in our series will cover some of the companies, nonprofits and initiatives at the intersection of data and access.
Subscribe to Attorney at Work
Get really good ideas every day for your law practice: Subscribe to the Daily Dispatch (it’s free). Follow us on Twitter @attnyatwork.