Redshift Architecture Performance Tips

John Ryan has a few hints to help us build speedy Redshift clusters:

The Need to Vacuum

As Redshift does not reclaim free space automatically, updates and delete operations can frequently lead to table growth. Equally, it’s important as new entries are added, that the data is maintained in a sorted sequence.

The VACUUM command is used to re-sequence data, and reclaim disk space as a result of DELETE and UPDATE operations. Although it won’t block other processes, it can be a resource-intensive operation, especially for data stored using interleaved sort keys.

It should be run periodically to ensure consistent performance and to reduce disk usage.

Some of this is good Postgres advice; some of it is good MPP advice (and serves well, for example, when dealing with Azure SQL Data Warehouse); the rest is Redshift-specific.

Related Posts

Running Spark MLlib to Feed Power BI

Brad Llewellyn shows how you can take Spark MLlib results and feed them into Power BI: MLlib is one of the primary extensions of Spark, along with Spark SQL, Spark Streaming and GraphX.  It is a machine learning framework built from the ground up to be massively scalable and operate within Spark.  This makes it […]

Read More

Power Query Container Size and Performance

Chris Web looks into what changing the Power BI Dataflow container size does for us: Currently there is no way to change this 256MB in Power BI Desktop or Excel although someone has already posted a suggestion on the Ideas site to allow us to change it. How much of an impact does this actually […]

Read More

Categories

October 2018
MTWTFSS
« Sep Nov »
1234567
891011121314
15161718192021
22232425262728
293031