Welcome folks today in this blog post we will be extracting tables
as df
from online pdf document url in command line. All the full source code of the application is shown below.
Get Started
In order to get started you need to install the below libraries using the pip
command as shown below
pip install pandas
After this just make an app.py
file and copy paste the following code
app.py
1 2 3 4 5 6 7 8 9 10 11 |
from tabula import read_pdf import ssl url = "URL OF THE PDF FILE" try: df = read_pdf(url) print(df) except Exception as e: print(e) |
As you can see we are importing the tabula
library from that we are using the read_pdf()
and here you need to replace the url
of the pdf file. And then we are using the read_pdf()
method and passing the url of the pdf file.