Building software that 'thinks' like a lawyer

Tuesday, 11 January 2022

Building software that 'thinks' like a lawyer

Codifying real estate legal expertise using data science and software engineering

This is a repost from Orbital Witness's Tech Blog that I recently setup with the engineering and data science teams to both recognise the work colleagues were doing internally as well as promote their work externally to attract potential candidates when hiring. At Orbital Witness we're building Natural Language Processing (NLP) systems for the real estate legal industry.

Presentation Video (9min)

On October 5th, 2021 Andrew Thompson (CTO) and Brian Kennedy (Head of Legal Engineering & Innovation) gave a presentation at the LegalGeek conference in Brick Lane, London. The title of their presentation was Building software that ‘thinks’ like a lawyer. It was a short 9 minute presentation where we highlighted five hurdles that arise when building such a system.

Lawyers, around the UK and abroad, currently conduct due diligence for their clients every day so that land and property can be bought, sold and developed. One of the main limiting aspects of this diligence process is the time taken to read through all legal documents to identify potential risks that could delay or halt a transaction. At Orbital Witness we’re solving this challenging problem by combining real estate legal knowledge, data science (specifically OCR - Optical Character Recognition and NLP - Natural Language Processing) and software engineering to automate real estate due diligence.

Document OCR

To pick up on just one hurdle mentioned in the above presentation, Hurdle #1 was concerned with how unpredictable OCR can be on legal real estate documents that Orbital Witness consumes. As can be seen in the image below, leases are available in varying degrees of quality and can often contain a variety of interesting artefacts such as:

Both handwritten and typed textual content
Stamps that obscure textual content
Redacted sections of textual content
Struck out sections of textual content

Solving for these types of artefacts is one challenging step in a larger data pipeline. That pipeline starts with transporting and processing a collection of photocopied real estate documents and ends with real estate risks that have been accurately identified and classified according to the context of a transaction. This is no small feat to solve, from a data science and software engineering perspective, but we’re building a very capable team of lawyers, product managers, data scientists and software engineers to own this and other challenges and deliver solutions for them.

We’re hiring

If you are interested in the above challenges and are curious to know more about how we currently solve them, please see our open roles and get in touch with us via our Careers Page. If nothing quite matches your experience then still feel free to connect and message me, Andrew Thompson, directly on LinkedIn and I’d be happy to have a casual chat via video conference or over coffee ☕️.

Tuesday, 11 January 2022