Welcome folks today in this blog post we will be generating sitemap
of given website url
and download it as attachment
in browser using flask
.All the full source code of the application is shown below.
Get Started
In order to get started you need to install the below libraries
using the pip
command as shown below
pip install flask
pip install bs4
pip install requests
And after that you need to make an app.py
file and copy paste the following code
app.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
from flask import Flask, render_template, request, make_response import requests from bs4 import BeautifulSoup import xml.etree.ElementTree as ET app = Flask(__name__) @app.route('/') def index(): return render_template('index.html') if __name__ == '__main__': app.run(debug=True) |
As you can see we are importing the required
libraries at the top and then we are loading the index.html
template when we go to the /
route. And then we are starting the flask
app at the port number 5000.
And now we need to create the templates
folder and inside it we need to create the index.html
file which will contain the simple html5 form where the user can submit the website url
for which the xml
sitemap will be generated and downloaded
templates/index.html
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
<!DOCTYPE html> <html> <head> <title>Sitemap Generator</title> </head> <body> <h1>Generate Sitemap</h1> <form action="/sitemap" method="post"> <label for="url">Enter website URL:</label> <input type="url" id="url" name="url" required> <input type="submit" value="Generate"> </form> </body> </html> |
And now we need to make the post
request when we submit the html form. Inside the app.py
file you need to copy paste the given code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
@app.route('/sitemap', methods=['POST']) def sitemap(): url = request.form['url'] page = requests.get(url) soup = BeautifulSoup(page.content, 'html.parser') links = set() for link in soup.find_all('a'): links.add(link.get('href')) root = ET.Element("urlset") for link in links: child = ET.SubElement(root, "url") loc = ET.SubElement(child, "loc") loc.text = link sitemap_xml = ET.tostring(root) response = make_response(sitemap_xml) response.headers["Content-Disposition"] = "attachment; filename=sitemap.xml" return response |
As you can see we are taking the url
which the user has submitted and then we are using the bs4
library and scraping
all the pages of the given website using the html5
parser and then we are using the for
loop to add all the links along side with their href
property. And then we are generating the xml
sitemap and downloading it as an attachment as shown below