data extraction by creating a new python module

Project detail

I would like to have a similar package for tables, images and text extraction. Where tables, images and text will be my sub-packages. Say we create a package named BLA, we should be able to extract tables from a particular pdf using a single line of code for example;
import BLA
from BLA import BLA.tables.txt
page_number will be another module which will extract tables from a particular page or all the pages if set to default. (something like this)
BLA.tables.txt.page_number
random_name=BLA.tables.txt. page_number (name of the pdf file)
print(random_name)

This should now give me the tables that are present on that particular page. Similarly, for images and text.
There are many open source packages like Tabula-py for table extraction.PyPDF2 for table, text and also images I think, pdfminer is used only for text extraction and then there is Camelot again for tables. I hope this is a bit clear about what is to be done in the project.

Skills Required

Python

Industry Categories

Websites, IT & Software

Freelancer type required for this project

Agency Freelancers

$140.00

Cost

Expiry Date

Expired

0 Proposals

Received till November 23, 2024

Project ID: 00004903

Click to save