Running a query that works against a specific tables distribution doesn't necessarily mean the query will be slow, after all you still have a heap of processors crunching data.ĭiagram of a Redshift Cluster (to remind you) However, if you are querying a pre-existing table, it's distribution style may actually work work against the query you are trying to run leaving you with two options (1) change your query to work with how a table is distributed, if possible, or (2) suck it up and deal. So if you are creating tables as part of a piece of analysis, you can have a hand in distributing the table in a way that will aid your future querying efforts. This tells Redshift how to spread your data across it's physical nodes. You achieve this when building tables by assigning a Distribution Style and Key. Rather than index at the beginning of a book outlining what subjects start and end where, instead each subject is in it's own book. Redshift takes a more physical approach to this. Just like an index in a book, if you know what page in a book a certain subject is on, you can jump right to it saving you from scanning through the whole book. In other data warehouses you can speed up table reads by defining an index on a table. As mentioned in my previous two articles, a table may be spread across multiple Compute Nodes and, is in part, one of the things that can result in Redshift returning results lightning quick.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |