Data Warehousing with AWS Redshift
With AWS Glue, the data that was initially in a flat model can now be represented with a more fitting star schema in a data warehouse.
The cloud data warehouse for this data will be created with AWS Redshift Serverless. This entails creating a namespace named
flights-namespace as well as a database named
dev. In addition, it requires a workgroup named
flights-workgroup, which will be used to write SQL queries.
Note: The workgroup has been configured to allow devices outside of the VPC to access the database. This will be useful when creating the visualization with Power BI
Now, we can open the query editor in Redshift and start creating the fact and dimension tables in the
First, the 4 tables in the schema need to be created in the warehouse using the following commands:
The four tables are now in the data warehouse, but they are all empty since the data is still in the
The data can be copied into this data warehouse using the
For instance, the data in
flights.csv can be copied into the
flights table using the following command syntax:
iam_rolevariable should be assigned whatever iam role is was selected when creating the workgroup.
By executing the
COPY command for each of the csv files in the
flights-data-processed bucket, the 4 tables should be filled with the necessary data.
As an example, here is a preview of the airport table:
This post originally appeared on TechToday.