Snowflake Pattern-based Query Clustering

Snowflake Pattern-based Query Clustering

Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, pattern-based clustering finds objects that exhibit coherent patterns in subspaces. Pattern-based clustering extends the concept of traditional clustering and benefits a wide range of applications. 

Query Clusters

 A query cluster is a group of queries that have the same or similar logical structure. The queries have enough in common that they could be considered the "Same". For example: 

  • select customer_id from table_jan
  • select customer_id from table_feb

Even when queries are not the same, they might have significant overlap and commonality. For Bluesky's objective, sameness is measured by how much overlap is there in use of compute, logic and data resources, with an eye for finding ways to make queries run faster and cheaper. Advanced pattern matching undercovers significant but non-obvious trends in the complete volume of customers' queries to target optimization.

Bluesky Query Patterns  

Bluesky’s SaaS product continuously analyzes Snowflake workloads to surface insight into what is costing the most and why using an innovative technology called query patterns. Bluesky not only surfaces insight into costs today but how their costs have been trending and what their projected costs will be over a certain time horizon. Bluesky also provides granular insights by warehouse, team, users and individual data pipelines and queries. 

By intelligently watching for similar query patterns, Bluesky can detect complex situations that simplistic visibility tools miss in order to identify workloads that provide no value, such as long-running queries that fail repeatedly without providing any value. For the remaining high-cost queries with business value validated, it also recommends ways to tune queries so they run faster and cost less. 

Bluesky uses profile-driven Query Cost Attribution and pattern-based Query Clustering to understand the implications of how customers are using data and recommend ways to continuously optimize performance more cost-effectively. Bluesky provides AI-powered, actionable recommendations that customers can implement immediately to drive measurable business impact. These insights and recommendations are delivered via a dashboard that provides interactive visualizations, insights and recommendations into warehouses, clusters, users and queries.