Skip to content

WebNinjaDeveloper.com

Programming Tutorials




Menu
  • Home
  • Youtube Channel
  • Official Blog
  • Nearby Places Finder
  • Direction Route Finder
  • Distance & Time Calculator
Menu

Flask BeautifulSoup4 Project to Generate XML Sitemap of Given Website URL and Download it as Attachment in Browser

Posted on January 12, 2023

 

 

Welcome folks today in this blog post we will be generating sitemap of given website url and download it as attachment in browser using flask.All the full source code of the application is shown below.

 

 

Get Started

 

 

In order to get started you need to install the below libraries using the pip command as shown below

 

 

pip install flask

 

 

pip install bs4

 

 

pip install requests

 

 

And after that you need to make an app.py file and copy paste the following code

 

 

app.py

 

 

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
from flask import Flask, render_template, request, make_response
import requests
from bs4 import BeautifulSoup
import xml.etree.ElementTree as ET
 
app = Flask(__name__)
 
@app.route('/')
def index():
    return render_template('index.html')
 
 
if __name__ == '__main__':
    app.run(debug=True)

 

 

As you can see we are importing the required libraries at the top and then we are loading the index.html template when we go to the / route. And then we are starting the flask app at the port number 5000.

 

 

And now we need to create the templates folder and inside it we need to create the index.html file which will contain the simple html5 form where the user can submit the website url for which the xml sitemap will be generated and downloaded

 

 

templates/index.html

 

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<!DOCTYPE html>
<html>
<head>
    <title>Sitemap Generator</title>
</head>
<body>
    <h1>Generate Sitemap</h1>
    <form action="/sitemap" method="post">
        <label for="url">Enter website URL:</label>
        <input type="url" id="url" name="url" required>
        <input type="submit" value="Generate">
    </form>
</body>
</html>

 

 

 

 

 

And now we need to make the post request when we submit the html form. Inside the app.py file you need to copy paste the given code

 

 

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
@app.route('/sitemap', methods=['POST'])
def sitemap():
    url = request.form['url']
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    links = set()
    for link in soup.find_all('a'):
        links.add(link.get('href'))
    root = ET.Element("urlset")
    for link in links:
        child = ET.SubElement(root, "url")
        loc = ET.SubElement(child, "loc")
        loc.text = link
    sitemap_xml = ET.tostring(root)
    response = make_response(sitemap_xml)
    response.headers["Content-Disposition"] = "attachment; filename=sitemap.xml"
    return response

 

 

As you can see we are taking the url which the user has submitted and then we are using the bs4 library and scraping all the pages of the given website using the html5 parser and then we are using the for loop to add all the links along side with their href property. And then we are generating the xml sitemap and downloading it as an attachment as shown below

 

 

 

 

Recent Posts

  • Angular 14/15 JWT Login & Registration Auth System in Node.js & Express Using MongoDB in Browser
  • Build a JWT Login & Registration Auth System in Node.js & Express Using MongoDB in Browser
  • React-Admin Example to Create CRUD REST API Using JSON-Server Library in Browser Using Javascript
  • Javascript Papaparse Example to Parse CSV Files and Export to JSON File and Download it as Attachment
  • Javascript Select2.js Example to Display Single & Multi-Select Dropdown & Fetch Remote Data Using Ajax in Dropdown
  • Angular
  • Bunjs
  • C#
  • Deno
  • django
  • Electronjs
  • java
  • javascript
  • Koajs
  • kotlin
  • Laravel
  • meteorjs
  • Nestjs
  • Nextjs
  • Nodejs
  • PHP
  • Python
  • React
  • ReactNative
  • Svelte
  • Tutorials
  • Vuejs




©2023 WebNinjaDeveloper.com | Design: Newspaperly WordPress Theme