Snowflake and Tableau are currently popular business intelligence tools. Although they serve different purposes, they are often used together. Both the tools offer seamless integration with each other. In this article, we discuss Tableau performance on Snowflake – how to troubleshoot and improve performance of Tableau dashboards and reports that use Snowflake.
In a recent project, we used Tableau to develop a dashboard using data from a Snowflake data warehouse. As part of the project, we built 8 reports on Tableau. Seven of these reports worked without problems; however, there was one report that was taking more than 60 seconds to load in Tableau. Unlike other reports which were mostly charts and graphs with summarized data, this report displayed data in a table with around 1 billion rows and 97 columns.
Troubleshooting Tableau Performance
As a first step, we decided to use the performance recording feature in Tableau. It’s a very useful tool to measure performance against each interaction made in a Tableau workbook. We analyzed the performance on events such as query execution, data source connections, layout computations, extract generation and server rendering. We found everything to be normal on the Tableau side, so we decided to direct our troubleshooting efforts towards Snowflake.
Troubleshooting Snowflake Performance
We ran the report again on Tableau and noted down the time. The report took over a minute to open. Whenever a Tableau report is opened for the first time, or any filters are selected, a group of SQL queries is executed at the backend in Snowflake. To determine how much time these underlying SQL queries were taking, we used the query history feature in Snowflake. This feature displays the list of queries executed in Snowflake along with start and end times, total duration and bytes scanned. It also allows you to filter the results based on user, session, and warehouse. We could see queries being run by Tableau with their execution times. In total all queries finished under 40 seconds which was still considerable, considering we had a dedicated extra-large warehouse for Tableau.
Steps to improve Tableau Performance on Snowflake
At this point, we were convinced that Snowflake performance was the problem. We solved for this by fixing several things as outlined below:
Removing Excess Data in Snowflake table
Tableau performance on Snowflake issues mostly arise when we have large data sets. We had over 1 billion rows in our underlying Snowflake table. It was obvious that this volume of data, displayed in a table would considerably increase execution time. The problem was how to remove some data to reduce the table size. The table contained both weekly and monthly data so after consulting with the users, we decided to remove the weekly data as the users only needed monthly data for their analysis. This dropped the record count significantly from 1 billion to 400 million records.
Adding Clustering Keys to Snowflake Table
In Snowflake, a Clustering key is one or more columns in a table that is used to enhance query performance. It can be any column but as a best practice, the column which is used as a filter is a good candidate for a clustering key. In our case, we had some pre-selected filters in the dashboard and decided to convert those into clustering keys. In this process, we made sure that we don’t overdo it, so we created three clustering keys as per Snowflake’s recommendation. Creating more clustering keys could increase the cost and affect performance - the opposite of what are trying to achieve!
Removing Unused Columns in Snowflake table
While doing our analysis, we found that there were quite a few columns that were present in the Snowflake table but were not used in the report as display fields or filters. Again, before removing anything from the table, we asked the users first if they would need these columns now or in future. It is important not just to solve for current Tableau performance problems, but also to look ahead to ensure that we can meet new requirements from users. The users confirmed that they wouldn’t need these fields. We dropped the unused columns from the table. We removed 62 unused columns which further reduced the size of the table.
While making these improvements one by one, we noted down the report execution times after each step. The table below provides a summary of the results achieved:
Weekly Data Removed
Clustering Keys Added
Unused Columns Removed
Select Filter 1
Select Filter 2
1 Min 29 Sec
1 Min 15 Sec
1 Min 10 Sec
1 Min 7 Sec
We evaluated the performance based on three test cases: Initial load, Select Filter 1 and Select Filter 2.
The Initial load test was the time it took the report to load the first time. Filter 1 and Filter 2 were dropdown filters in the report. As you can see, we achieved around 50% improvement after removing weekly data from the table. The creation of clustering keys also brought down the time, especially when Filter 1 was selected. The removing of unused columns had a little impact on performance, however, when we selected the Filter 2, we didn’t see any major gain in performance against all three test cases. To overcome this, we decided to make Filter 2 as a pre-selected filter, meaning when the report opens for the first time, a value is already selected in Filter 2.
It is important to optimize Tableau performance on Snowflake by looking at the performance of both Tableau and Snowflake; but also adjusting the Tableau reports to compensate for any slow Snowflake query response. Overall, this was a great exercise for us to understand the process of troubleshooting and optimization of Tableau performance on Snowflake.