Welcome folks today in this blog post we will be converting pdf document to csv file in command line using python. All the full source code of the example is shown below.
Get Started
In order to get started you need to install the tabula-py
library by using the command as shown below
pip install tabula-py
This library is written in java language and also in this pdf document we first convert python dataframe to python json object.
app.py
1 2 3 4 5 6 7 |
# Import the required Module import tabula # Read a PDF File df = tabula.read_pdf("IPLmatch.pdf", pages='all')[0] # convert PDF into CSV tabula.convert_into("IPLmatch.pdf", "iplmatch.csv", output_format="csv", pages='all') print(df) |
Here we are importing the tabula module and then we are using the read_pdf() method in the first argument we are providing the pdf input file name and then we are providing all the pages. And then we are using the convert_into() method to convert pdf to csv file. Also we are providing the output_format to csv and also pages parameter to all.