Welcome folks today in this blog post we will be using the pandas
library and pyarrow
library to convert the csv
file to parquet
file in command line. All the full source code of the application is shown below.
Get Started
In order to get started you need to install the below libraries using the pip
command as shown below
pip install pandas
pip install pyarrow
And now you need to make an app.py
file and copy paste the following code
app.py
1 2 3 4 5 6 7 |
import pandas as pd # Read the CSV file df = pd.read_csv('output.csv') # Convert the CSV file to Parquet df.to_parquet('file.parquet', index=False) |
As you can see we are importing the pandas
library and then we are using the read_csv()
method to load the csv
file and then we are using the to_parquet()
method to convert the csv file to parquet
file.
Now we will be using the pyarrow
library to convert the csv
file to parquet
file as shown below
1 2 3 4 5 6 7 8 9 10 11 12 |
import pyarrow as pa import pyarrow.parquet as pq import pandas as pd # Read the CSV file df = pd.read_csv('output.csv') # Convert the DataFrame to a Table table = pa.Table.from_pandas(df) # Write the Table to a Parquet file pq.write_table(table, 'file.parquet') |