Get started with data visualisation in Apache Superset

05.09.2023 14:00 von Daniel Keil

From setup to Proof of Concept (PoC) in just one day

Get started with data visualisation in Apache Superset

With Apache Superset, modern dashboards can be created very quickly with the "no-code visualisation builder".

The Apache Foundation's (ASF) open source project Apache Superset is a modern business intelligence software with comprehensive data analysis and visualisation capabilities. The project was added to the ASF's roster of top-level projects in January 2021 [1]. With Apache Superset, modern dashboards can be created very quickly with the "no-code visualisation builder".

Installation is easiest under Docker, Kubernetes or with PyPi. But the latter only contains the compiled elements of the frontend after the build . Subsequent changes or further developments cannot, or only with difficulty, be realised with this.

We have therefore decided to install Superset from scratch. It works for various Unix/Linux distributions and Mac OS. The installation is done with the Python package manager pip and nodejs. However, there is a number of unforeseeable obstacles in building the application, for which the documentation [2] unfortunately lacks the crucial hints.

After some research, however, we managed to successfully set it up under RHEL 8.7 and under Debian/Ubuntu as well. This blog post is intended to help overcome the setup hurdles of Apache Superset and enable a better start.

Before the installation - preparation with python and nodejs

These instructions refer to Apache Superset version 2.1.0 under RedHat 8.7. In general, it is advisable to follow the constantly updated official setup documentation in order to be up to date. The required software packages listed there must be installed. Regarding Python, however, it is essential to install version 3.8:

sudo yum install python38 sudo yum install python38-devel sudo yum install python38-pip sudo yum install python38-wheel

NodeJS version 16 is required. This version is installed, for example, as an AppStream module with the following command:

sudo dnf module install nodejs:16

Then create a Python Virtual Environment as described in the documentation and activate it. Differing from the documentation, some modules are required in special versions:

pip install sqlparse=='0.4.3'
pip install pillow
pip install marshmallow-enum

Now the preparations are finished and the actual installation of Apache Superset can begin.

Installing Superset with python

First, download the current version, in our case 2.1.0, into the Python Virtual Environment created above and unpack it:

cd venv
wget https://dist.apache.org/repos/dist/release/superset/2.1.0/apache-superset-2.1.0-source.tar.gz
tar -xvzf apache-superset-2.1.0-source.tar.gz
cd apache-superset-2.1.0rc3

Now the installation can be started:

pip install -e .

This should run successfully after the above additions and setting up the appropriate Python version 3.8 and updating pip3.8 in the virtual environment.

Configuration of Superset

The next step is to edit the Superset configuration file:

cd ../bin
nano superset_config.py

Here the URL to the provided SQLite demo database, the key for the session cookies and encryption in the database as well as optional languages are adjusted:

SECRET_KEY = 'your-supersecret-password-123'
SQLALCHEMY_DATABASE_URI = 'sqlite://///home//username//venv//superset.db'
LANGUAGES = {
'de': {'flag': 'de', 'name': 'Deutsch'},
'en': {'flag': 'us', 'name': 'English'},
}

Setting up and configuring the Superset demo database

For the supplied demo database, initialisation and the creation of an admin user are necessary before the sample data can be loaded:

cd ../apache-superset-2.1.0rc3/
export FLASK_APP=superset
superset db upgrade
superset fab create-admin

We have made the experience that the loading of the sample data stops because in some cases the number of data sets is set too high. Therefore, the chunksize in some files must be changed to 1:

sed -i 's/chunksize=500/chunksize=1/g' $(find ~/superset/apache-superset-2.1.0rc3/superset/examples/. -maxdepth 1 -type f) sed -i 's/chunksize=50/chunksize=1/g' $(find ~/superset/apache-superset-2.1.0rc3/superset/examples/. -maxdepth 1 -type f) sed -i 's/CHUNKSIZE = 512/CHUNKSIZE=1/g' $(find ~/superset/apache-superset-2.1.0rc3/superset/datasets/commands/importers/v1/. -maxdepth 1 -type f)

The sample data can now be loaded:

superset init
superset load_examples

Build the Superset Frontend

Now the frontend must be built. An error occurred when downloading the nodejs module "puppeteer". Therefore the following definition is important, otherwise the npm build process will stop:

# Build javascript assets
cd superset-frontend
# skip faulty puppeteer download
# https://devdocs.io/puppeteer/#environment-variables
export PUPPETEER_SKIP_DOWNLOAD=true

Now the frontend can be build:

npm ci
npx update-browserslist-db@latest
# try to install currencyformatter.js into plugin\plugin-chart-handlebars
npm install currencyformatter.js --save
npm run build

Done! - Starting the Superset Server

After the frontend has been created, the Superset server can be started. The port (in the example 8088) and the IP (in the example 127.0.0.1) can be adjusted. To make Superset accessible from outside the host, the server should of course not be bound to localhost (127.0.0.1).

cd ..
superset run -p 8088 -h 127.0.0.1 --with-threads --reload --debugger

It can happen that firewall rules prevent access to the Superset instance, especially with newly set up virtual machines. Therefore, it makes sense to define appropriate firewall rules:

sudo firewall-cmd --zone=public --permanent --add-service=http
sudo firewall-cmd --zone=public --permanent --add-port 8088/tcp
sudo firewall-cmd --reload

With the above described instructions for the By installing Apache Superset using these instructions, a modern open source business intelligence environment with comprehensive features for data analysis and visualisation is available.

We are also happy to help you with the initial setup on site or remotely. Benefit from many years of experience with analytical applications and data integration. Contact us for a non-binding discussion or let us talk about the implementation of your technical requirements for an initial business case.

References:

[1] https://news.apache.org/foundation/entry/the-apache-software-foundation-announces70
[2] https://superset.apache.org/docs/installation/installing-superset-from-scratch

Zurück