Welcome folks today in this blog post we will be scraping the html tables
from the webpage and saving it as csv
file in python using the beautifulsoup4
library. All the full source code of the application is shown below.
Get Started
In order to get started you need to install the below library using the pip
command as shown below
pip install bs4
And after that you need to make an app.py
file and copy paste the following code
app.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
import requests from bs4 import BeautifulSoup import csv # Make a GET request to the URL that contains the table url = "https://wpdatatables.com/documentation/column-features/url-link-columns/" response = requests.get(url) # Parse the HTML content of the page soup = BeautifulSoup(response.text, "html.parser") # Find the table in the HTML content table = soup.find("table") # Write the table data to a CSV file with open("table.csv", "w", newline="") as csvfile: writer = csv.writer(csvfile) # Write the header row header = [th.text.strip() for th in table.find("thead").find_all("th")] writer.writerow(header) # Write the data rows for row in table.find("tbody").find_all("tr"): data = [td.text.strip() for td in row.find_all("td")] writer.writerow(data) |
As you can see we are importing the bs4
library at the top and then we have provided the online
url of the webpage where some html table
is there so now it will scrape the whole webpage and scan it and find all the table
tags and convert that data into the csv
file and save it using the bs4
library.