In this small tutorial we will install Beautiful Soup on Raspberry Pi (Bookworm)
1.) Make sure that your Pi OS is up to date via:
sudo apt-get update
sudo apt-get upgrade
2.) install Phyton 3 via:
sudo apt install python3
Note: this is often pre-installed
3.) Install Beautiful Soup via
sudo apt install python3-bs4
4.) For the webscraping we require also the requests library, which we install via:
sudo apt install python3-requests
Note: this is often pre-installed
5.) Now we can test if it works. For the test we will create a small python script via:
sudo nano test-script.py
6.) Once nano is open, copy the following inside the script:
# imports
import requests
from bs4 import BeautifulSoup
import json
# define which site to crawl
page = requests.get("https://www.google.com")
# Check if we got a website and not an error
if page.status_code == 200:
content = page.content
# parse the content from the side via BeautifulSoup
DOMdocument = BeautifulSoup(content, 'html.parser')
# extract the titel
title = DOMdocument.title.string
# save the data
data = {
"title": title
}
# dump the data to a file
with open('google_title.json', 'w') as json_file:
json.dump(data, json_file, indent=4)
print("HTML Titel from the Google main page was exported into a JSON file.")
After that close nano and save the file
7.) Now we will run the script via:
sudo python test-script.py
8.) After the script finished (which might take some time depending on your device) we can check the content via
nano google_title.json
The content should include the meta titel tag from the main google page.