Last October, I reported here on the launch of an innovative project, Learned Hands, that uses a game to train a machine-learning algorithm to better identify the legal issues in the words that ordinary people use to describe their problems. The goal was to use artificial intelligence to help legal services providers better match a consumer to the appropriate attorney or legal resource.

Now, The Pew Charitable Trusts has awarded a grant to one of the sponsors of that project, Suffolk Law School’s Legal Innovation and Technology Lab, to move the project from proof of concept to production.

The grant runs through December 2020 and will be used to create issue spotters, in the forms of both an application programming interface (API) and a Python programming library, that will be free to use for public-interest groups, the LIT Lab’s director, David Colarusso, told me. It is also expected that the Legal Services Corporation will use the issue spotters in its project to develop state legal portals.

David Colarusso

The idea behind Learned Hands was to create a game to incentivize players to crowdsource the task of spotting legal issues in real people’s stories about their legal problems. Players earn points and rankings based on how many questions they mark and the extent to which their marks are deemed correct.

The goal is to train a machine-learning algorithm to spot legal issues issues. That required both people-power to do the tagging and a collection of actual questions against which to train. For that, the project obtain a collection of some 75,000 questions posted in the Reddit forum r/legaladvice.

That project went well, Colarusso said, drawing participation by nearly 600 people who created 54,000 labels resulting in the finalization of about 2,000 of the questions. The labeling is being used to create a taxonomy of legal issues that better matches the words and phrases that regular people use, based on the National Subject Matter Index developed by the Legal Services National Technology Assistance Project.

Building An Infrastructure

Through this new grant, LIT Lab students will build out the engineering infrastructure to deliver the API and documentation, with the goal of having it available for use by the end of 2020. The API will be free to use for non-profits and legal services organizations.

How could legal services organizations use this issue spotter? Colarusso outlined four potential use cases:

  • In portals and court service centers, where a consumer could come to the site, ask a question in plain English, and be directed to the appropriate resource.
  • For ask-a-lawyer or limited-scope representation projects, to ensure that consumer inquiries and messages get routed to the right lawyer.
  • As what Colarusso calls a “cognitive exoskeleton” to help in situations where a consumer is connected to a live person or paraprofessional. The issue spotter can help triage the issues so the person taking the call does not start from scratch.
  • To look at incoming inquiries in the aggregate, so that an organization can better understand on a broad level the types of queries that are coming in and where to allocate its resources.

Other Project Goals

In addition to building the API, Colarusso has other goals for the project. One is to continue to train the algorithm against new data. While the Reddit corpus has proved useful, Colarusso said, the source suggests that it is likely skewed towards a demographic that is younger, male, and better off than the general population. So more representative data is needed.

In addition, the algorithm needs to adapt to semantic shifts over time, changes in the language people use to talk about their problems.

“At the end of the day, you can have a fancy algorithm, but the thing that is the real killer is the data,” Colarusso said. “So the more data we have, the better.”

One way to address this need for further and continual training would be through some sort of feedback loop incorporated within the API, so that the system could learn from people’s actual behavior in using it. But a feedback loop could be problematic as some organizations or consumers may not want their data feeding back to the LIT Lab, Colarusso said. There may also be legal restraints, such as under California’s new privacy law.

Recognizing this, another of Colarusso’s goals is to develop a set of standards and best practices around the responsible use of the algorithm and API. Such standards might address issues such as consumer opt-in or disclosures around the use of the API. Colarusso said that if people have ideas about formulating these standards, he welcomes their input.

One final goal for the project is to come up with a plan to make it sustainable beyond the period of the grant. Perhaps students in the LIT Lab will be able to shoulder some of the ongoing work, Colarusso said, but he also does not rule out the possibility of licensing the API to for-profit entities or seeking other forms of outside support.

The one point on which he is certain is that access to the technology will always be free for non-profits. “We don’t want non-profits ever to have to pay for it,” he said.

As for the original Learned Hands project, although the Pew grant that funded it has now run out, the site will continue to operate under the auspices of both the LIT Lab and the project’s other partner, Stanford Law School’s Legal Design Lab.