We originally imagined AI to potentially advance to its current state 80 years from now. Instead, here we are, with the next great technology revolution developing right before our eyes. We suspect it will be every bit as transformative—and disruptive—as the rise of the Internet. Although there are numerous philosophical musings to be made on the topic, such discussions will need to be left for later posts, as there is simply too much to build right now. What we want to discuss here is our current work in the area and set forth an ambitious agenda for prospective development.
Most non-technical individuals are well aware of ChatGPT and its prodigious ability to generate text, answer questions, and summarize documents. However, OpenAI has also made available “application programming interfaces” or APIs for developers, which provides a wide array of features that can be embedded in client software or server-side systems. These APIs allow for a more specialized, varied, and fine-tuned use of OpenAI’s models. Although some of these features will be easily leveraged through publicly available resources that can be used in conjunction with proprietary data (such as the document search widget recently released as one of the OpenAI widgets), other tools—and perhaps the ones that will be most valuable for law firms, will need to be built internally.
One of the innovative capabilities of the OpenAI APIs is the ability to train and build one’s own “classifiers.” A “classifier,” once trained, can take an input such as a word, a large number of words, or an image, etc. and then provide (as an output) a designation of what the input is (i.e., the classifier “classifies” the input). For example, in the context of data privacy, a classifier could flag certain data in web logs as likely to be personal information. Similarly, a classifier could be fed network traffic (as an input) for the purpose of classifying certain remote hosts as likely to be for session replay, fingerprinting, real-time bidding, malicious, etc.
Another capability of the OpenAI APIs is called “semantic search.” This functionality allows a developer to provide a substantial amount of text groupings to the OpenAI’s “embeddings” API, which in turn, calculates a corresponding vector list (i.e., a list of large numbers) representing the meaning of each grouping of text in vector space. One can then query against the vectorized documents and OpenAI will indicate which documents have the shortest vector distance to your query—i.e. which documents are closest in meaning. This will allow for powerful search to be deployed across various internal document collections. Gone are the days of having to rely on keyword match.
OpenAI’s chat bot, ChatGPT, is already famous for open-ended text generation. However, with additional tuning and customization using the OpenAI APIs, text generation is a powerful tool that can assist with many legal-related functions in ways that are turn-key and can expand far beyond the capabilities of ChatGPT’s question-answer bot.
So far we’ve used OpenAI to prototype:
- A Contract Digester
- A Contract Improver
- A Semantic Search Tool for the new CCPA regulations
- A Web Log Classifier
This is just the beginning of course, and the upper bound of OpenAI’s utility is only constrained by one’s imagination. The practice of law, for better or worse (and we suspect for the better), has now become a substrate for engineering.