Algorithms for extracting lines, paragraphs with their properties in PDF documents
2023
Martsinkevich Viacheslav | Berezhkov Andrei | Tereshchenko Vladislav | Gorlushkina Natalia | Tretjakova Violetta
The article discusses the algorithms for detecting and extracting lines, paragraphs with their properties and attributes in PDF documents, analyses the structure of PDF-file and its objects. Due to special operators in objects the PDF documents content is saved as symbols or symbol groups. The position of such groups on the page also remains identical. The main challenge that we face, while extracting paragraphs from the PDF document is the complex format that is able to retain various types of information and can be created in several ways.
Show more [+] Less [-]Bibliographic information
This bibliographic record has been provided by Directory of Open Access Journals