Welcome folks today in this blog post we will be extracting
tables from given pdf
document using camelot
library in python. All the full source code of the application is shown below.
Get Started
In order to get started you need to install the below library using the pip
command as shown below
pip install camelot-py
Now we need to make an app.py
file and copy paste the following code
app.py
1 2 3 4 |
import camelot # Extract the tables from the PDF tables = camelot.read_pdf('table.pdf') |
As you can see we are importing the camelot
library and then we are reading
the given pdf document using the read_pdf()
method. And inside the argument we are passing the path of the given pdf document.
1 2 3 4 5 |
# Print the number of tables extracted print(f'Number of tables: {len(tables)}') # Print the first table as a pandas DataFrame print(tables[0].df) |
And now we are printing out how many tables
are there inside the pdf document. And we are printing the total number of tables in the command line. And then we are printing the tables as dataframes
in the terminal as shown above.