2
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

Hands-on Apache Superset, Amazon S3, and Amazon Athena

Last updated at Posted at 2020-08-31

What is Apache Superset?

"Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application". (Apache Software Foundation)

Some other equivalents you might've heard of would be Tableau or PowerBI, but they're all business licensed software.

What about Amazon S3 and Athena?

S3 : "Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. " (Amazon Web Service)

Athena : "Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run." (Amazon Web Service)

What You'll Need Beforehand

  1. An AWS account (and cash duh).
  2. AWS credentials set.
  3. An Ubuntu 18.04+ environment.
  4. Mapbox account.
  5. pip installed.

Installation

PyAthena

  • Apache Superset needs an API interface to interact with AWS Athena.
pip install "PyAthena>1.2.0"

Apache Superset

  • Install superset
pip install apache-superset
  • Initialize the database
superset db upgrade
  • Create an admin user (you will be prompted to set a username, first and last name before setting a password)
export FLASK_APP=superset
superset fab create-admin
  • Load some data to play with
superset load_examples
  • Create default roles and permissions
superset init

Workflow

  • To start a development web server on port 8088, use -p to bind to another port
superset run -p 8088 --with-threads --reload --debugger
  • Switch to your browser and go to http://127.0.0.1:8088/, you should now see something resembling the following

image.png

  • Login with the admin account you have just created. You'll see some examples have been loaded if you followed the tutorial. Play with them if you want to, but we'll be using some other data for demonstrative purposes.

image.png

aws s3 cp ~PATH/TO/AB_NYC_2019.csv s3://YOUR-BUCKET
  • Now, come back to your Apache Superset's UI and add the click on Databases, then the + button on the top right hand corner.

  • image.png
    image.png
    image.png

  • Modify and add the following text to SQL Alchemy URI.

awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}
  • Log into AWS Athena's interface and define the columns you need for your database. I won't be using all the columns for simplicity.(Make sure that the region of your AWS Athena and the S3 bucket you made is the same) If you're familiar enough with AWS Athena, you can execute the exact query on Apache Superset's UI.

image.png

  • Going back to your Apache Superset UI, you should see the following

image.png

  • Run a query of your preference and click on Explore.

image.png

  • Running a deck.gl visualization gives us ...

image.png

export MAPBOX_API_KEY=your-token
  • Restart your server and you should now see ...

image.png

Darker grids are where the average Airbnb price are higher.

image.png

Here's a place where some of the more expensive Airbnb's rooms are clustered, and the reasons might be apparent.

Conclusions

There are a lot left to talk about with Apache Superset, AWS S3, and AWS Athena, but the general idea here is to demonstrate a data analysis workflow combining various tools. Indeed, one can achieve this without using any of the above, for instance, with the combination of Tableau and Google Bigquery.

Reference

  1. Apache Software Foundation, "Apache Superset (incubating)", Apache Software Foundation. https://superset.incubator.apache.org/#apache-superset-incubating. 22 August 2020.
  2. Amazon Web Service, "Amazon S3", Amazon Web Service. https://aws.amazon.com/s3/. 22 August 2020.
  3. Amazon Web Service, "Amazon Athena", Amazon Web Service. https://aws.amazon.com/athena/. 22 August 2020.
2
0
3

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?