Examining Flights in the U.S. with AWS and Power BI | by Aashish Nair | Jul, 2023

Data Warehousing with AWS Redshift

With AWS Glue, the data that was initially in a flat model can now be represented with a more fitting star schema in a data warehouse.

The cloud data warehouse for this data will be created with AWS Redshift Serverless. This entails creating a namespace named flights-namespace as well as a database named dev. In addition, it requires a workgroup named flights-workgroup, which will be used to write SQL queries.

Note: The workgroup has been configured to allow devices outside of the VPC to access the database. This will be useful when creating the visualization with Power BI

Workgroup (Created by Author)

Now, we can open the query editor in Redshift and start creating the fact and dimension tables in the dev database.

Query Editor (Created by Author)

First, the 4 tables in the schema need to be created in the warehouse using the following commands:

Created Tables (Created by Author)

The four tables are now in the data warehouse, but they are all empty since the data is still in the flights-data-processed bucket.

The data can be copied into this data warehouse using the COPY command.

For instance, the data in flights.csv can be copied into the flights table using the following command syntax:

Note: the iam_role variable should be assigned whatever iam role is was selected when creating the workgroup.

By executing the COPY command for each of the csv files in the flights-data-processed bucket, the 4 tables should be filled with the necessary data.

As an example, here is a preview of the airport table:

Query Output (Created by Author)

Source link

This post originally appeared on TechToday.

Leave a Reply

Your email address will not be published. Required fields are marked *