Data collection and extraction is often described as taking between 80% and 90% of the time in an end-to-end analysis, from planning to deriving conclusions from the results. In the past, this sort of data extraction has been conducted using manual entry by humans, which will always work but is prone to error and takes significantly longer than the tools take.
Some of the tools that we use at Bates White for data extraction include:
- Google’s Tesseract
- AWS Textract
- Apache Tika
- Adobe Acrobat
Document search involves a collection of any variety of file types containing text in some format, processed and stored in a central location with the ability to perform searches on the text within those documents. The original documents can be in any format; during their processing, the text within them is extracted, and that text is what the end user is searching.
Some of the tools that we use at Bates White for document search include:
- Apache SOLR
- Relativity
- Everlaw
- AWS Kendra
- Elastic/OpenSearch