Snowflake Questions

1. Explain Snowflake architecture

Snowflake is built on an AWS cloud data warehouse and is truly a Saas offering. There is no software, hardware, ongoing maintenance, tuning, etc. needed to work with Snowflake.

Three main layers make the Snowflake architecture – database storage, query processing, and cloud services.

  • Data storage – In Snowflake, the stored data is reorganized into its internal optimized, columnar, and optimized format. 
  • Query processing – Virtual warehouses process the queries in Snowflake.
  • Cloud services – This layer coordinates and handles all activities across the Snowflake. It provides the best results for Authentication, Metadata management, Infrastructure management, Access control, and Query parsing.

2. Describe Snowflake computing. 

Snowflake cloud data warehouse platform provides instant, secure, and governed access to the entire data network and a core architecture to enable various types of data workloads, including a single platform for developing modern data applications. Snowflake brings together the power of data warehouses, the scalability of big data platforms, the elasticity of the cloud, and real-time data sharing at a fraction of the cost of traditional solutions.

3. What is the use of the Cloud Services layer in Snowflake?

The services layer acts as the brain of the Snowflake. In Snowflake, the Services layer authenticates user sessions, applies security functions, offers management, performs optimization, and organizes all the transactions.

4. What type of database is Snowflake?

Snowflake is built entirely on a SQL database. It’s a columnar-stored relational database that works well with Excel, Tableau, and many other tools. Snowflake contains its query tool, supports multi-statement transactions, role-based security, etc., which are expected in a SQL database.

5. How is data stored in Snowflake?

Snowflake stores the data in multiple micro partitions which are internally optimized and compressed. The data is stored in a columnar format in the cloud storage of Snowflake. The data objects stored by Snowflake cannot be accessed or visible to the users. By running SQL query operations on Snowflake, you can access them.

6. How many editions of Snowflake are available?

Snowflake offers four editions depending on your usage requirements.

  • Standard edition – Its introductory level offering provides unlimited access to Snowflake’s standard features.
  • Enterprise edition – Along with Standard edition features and services, offers additional features required for large-scale enterprises.
  • Business-critical edition – Also, called Enterprise for Sensitive Data (ESD). It offers high-level data protection for sensitive data to organization needs.
  • Virtual Private Snowflake (VPS) – Provides high-level security for organizations dealing with financial activities.

7. Explain Virtual warehouse

In Snowflake, a virtual warehouse, often known as a “warehouse,” is a collection of computational resources. A virtual warehouse provides the resources required for the users like CPU, memory, and temporary storage to perform multiple Snowflake operations:

  • Execute the SQL SELECT statements that necessitate the use of computing resources  (e.g. retrieving rows from tables and views).
  • DML operations include:
  • Updating table rows (DELETE , INSERT , UPDATE).
  • Data Loading into tables (COPY INTO <table>).
  • Data unloading from tables (COPY INTO <location>).

8. Is Snowflake OLTP or OLAP?

An OLTP (Online Transactional Processing) database contains detailed and up-to-date data, as well as a large volume of typically small data transactions. In turn, online analytical processing (OLAP) often necessitates complex and aggregated queries with a small number of transactions. Snowflake’s database schema is built around online analytical processing.

9. Explain Columnar database

The columnar database is opposite to the conventional databases. It saves the data in columns in place of rows, eases the method for analytical query processing and offers more incredible performance for databases. Columnar database eases analytics processes, and it is the future of business intelligence.

10. What is the use of a database storage layer?

Whenever we load the data into the Snowflake, it organizes the data into the compressed, columnar, and optimized format. Snowflake deals with storing the data that comprises data compression, organization, statistics, file size, and other properties associated with the data storage. All the data objects we store in the Snowflake are inaccessible and invisible. We can access the data objects by executing the SQL query operation through Snowflake.

11. What is the use of the Compute layer in Snowflake?

In Snowflake, Virtual warehouses perform all the data handling tasks. Which are multiple clusters of the compute resources. While performing a query, virtual warehouses extract the least data needed from the storage layer to satisfy the query requests.

12. What are the different ways to access the Snowflake Cloud data warehouse?

We can access the Snowflake data warehouse through:

  • A web-based user interface from which all aspects of Snowflake management and usage can be accessed.
  • Command-line clients (such as SnowSQL) that can access all parts of Snowflake management and use.
  • Snowflake has ODBC and JDBC drivers, which allow other applications (like Tableau) to connect to it.
  • Native connectors (e.g., Python, Spark) for developing programmes that connect to Snowflake.
  • Third-party connectors can be used to link applications such as ETL tools (e.g., Informatica) and BI tools (e.g., ThoughtSpot) to Snowflake.

13. Why is Snowflake highly successful?

Snowflake is highly successful because of the following reasons:

  • It assists a wide variety of technology areas like data integration, business intelligence, advanced analytics, security, and governance.
  • It offers cloud infrastructure and supports advanced design architectures ideal for dynamic and quick usage developments.
  • Snowflake supports predetermined features like data cloning, data sharing, division of computing and storage,  and directly scalable computing.
  • Snowflake eases data processing.
  • Snowflake provides extendable computing power.
  • Snowflake suits various applications like ODS with the staged data, data lakes with data warehouse, raw marts, and data marts with acceptable and modelled data.

14. How do we secure the data in the Snowflake?

Data security plays a prominent role in all enterprises. Snowflake adapts the best-in-class security standards for encrypting and securing the customer accounts and data that we store in the Snowflake. It provides the industry-leading key management features at no extra cost.

15. What are Micro Partitions?

Snowflake comes along with a robust and unique kind of data partitioning known as micro partitioning. Data that exists in the Snowflake tables are systematically converted into micro partitions. Generally, we perform Micro partitioning on the Snowflake tables.

16. How is Snowflake different from Redshift?

Both Redshift and Snowflake provide on-demand pricing but vary in package features. Snowflake splits compute storage from usage in its pricing pattern, whereas Redshift integrates both.

SnowflakeRedshift
Snowflake is a comprehensive SaaS solution that requires no maintenance.AWS Redshift clusters necessitate some manual maintenance.
Snowflake separates computing and storage, allowing for customizable pricing and setup.Reserved/Spot instance price in Redshift provides for cost optimization.
Snowflake uses real-time auto-scaling.Redshift, on the other hand, involves the addition and removal of nodes in order to scale.
Snowflake provides less data customisation options.Where Redshift facilitates data flexibility with features such as partitioning and distribution.
Snowflake provides always-on encryption with strict security checks.While Redshift offers a flexible, customised security strategy.

17. Explain Snowpipe in Snowflake

Snowpipe is Snowflake’s continuous data ingestion service. Snowpipe loads data in minutes once files are uploaded to a stage and submitted for ingestion. Snowflake maintains load capacity with Snowpipe’s serverless compute approach, assuring appropriate compute resources to meet demand. In a nutshell, Snowpipe provides a “pipeline” for loading new data in micro-batches as soon as it becomes available.

The data is loaded using the COPY command defined in a connected pipe. Snowpipe can use a pipe, which is a named, first-class Snowflake object containing a COPY statement. The COPY statement specifies the location of the data files (i.e., a stage) as well as the target table. All data types, including semi-structured data types like JSON and Avro, are supported.

18. There are several ways for detecting staged files:

  • Using cloud messaging to automate Snowpipe
  • REST endpoints in Snowpipe

19. The Snowpipe benefits are as follows:

  • Real-time insights
  • User-friendly
  • Cost-efficient
  • Resilience

20. Explain Snowflake Time Travel

Snowflake Time Travel tool allows us to access the past data at any moment in the specified period. Through this, we can see the data that we can change or delete. Through this tool, we can carry out the following tasks:

  • Restore the data-associated objects that may have lost unintentionally.
  • For examining the data utilization and changes done to the data in a specific time period.
  • Duplicating and backing up the data from the essential points in history.

21. Differentiate Fail-Safe and Time-Travel in Snowflake

Time-TravelFail-Safe
According to the Snowflake edition, account or object particular time travel setup, users can retrieve and set the data reverting to the history.Fail-Safe, the User does not have control over the recovery of data valuable merely after completing the period. In this context, only Snowflake assistance can help for 7 days. Therefore if you set time travel as six days, we retrieve the database objects after executing the transaction + 6 days duration.

22. What is zero-copy Cloning in Snowflake?

Zero copy cloning is a snowflake implementation in which a simple keyword CLONE allows you to generate a clone of your tables, schemas, and databases without replicating the actual data. As a result, you can have practically real-time data from production cloned into your dev and stage environments to conduct various activities.

Advantages:

  1. There are no additional storage costs associated with data replication.
  2. There is no waiting time for copying data from production to non-production contexts.
  3. There is no need for administrative efforts since cloning is as simple as a click of a button. 
  4. No copy, only clone: Data exists only in one place.
  5. Promote corrected/fixed data to production instantly.

23. What is Data Retention Period in Snowflake?

The data retention period is an important aspect of Snowflake Time Travel.

When data in a table is modified, such as deletion or discarding an object holding data, Snowflake saves the data’s previous state. The data retention period determines the number of days that this historical data is kept and, as a result, Time Travel operations (SELECT, CREATE… CLONE, UNDROP) can be performed on it.

The standard retention period is one day (24 hours) and is enabled by default for all Snowflake accounts.

24. What is the use of Snowflake Connectors?

The Snowflake connector is a piece of software that allows us to connect to the Snowflake data warehouse platform and conduct activities such as Read/Write, Metadata import, and Bulk data loading.

25. The Snowflake connector can be used to execute the following tasks:

  • Read data from or publish data to tables in the Snowflake data warehouse.
  • Load data in bulk into a Snowflake data warehouse table.
  • You can insert or bulk load data into numerous tables at the same time by using the Numerous input connections functionality.
  • To lookup records from a table in the Snowflake data warehouse.

26. Following are the types of Snowflake Connectors:

  • Snowflake Connector for Kafka
  • Snowflake Connector for Spark
  • Snowflake Connector for Python

27. Describe Snowflake Clustering

In Snowflake, data partitioning is called clustering, which specifies cluster keys on the table. The method by which you manage clustered data in a table is called re-clustering.

A clustering key is a subset of columns in a table (or expressions on a database) that are deliberately intended to co-locate the table’s data in the same micro-partitions. This is beneficial for very large tables where the ordering was not perfect (at the time the data was inserted/loaded) or if extensive DML has weakened the table’s natural clustering.

Some general indicators that can help determine whether a clustering key should be defined for a table are as follows:

  • Table queries are running slower than expected or have degraded noticeably over time.
  • The table’s clustering depth is large.

28. Explain Data Shares

Snowflake Data sharing allows organizations to securely and immediately share their data. Secure data sharing enables sharing of the data between the accounts through Snowflake secure views, database tables.

29. What is “Stage” in the Snowflake?

In Snowflake, stages are data storage locations. If the data to be imported into Snowflake is stored in another cloud area, such as AWS S3, Azure, or GCP, these are referred to as External stages; if the data is stored within Snowflake, they are referred to as Internal stages.

Internal Stages are further divided as below

  • Table Stage
  • User Stage
  • Internal Named Stage

30. Does Snowflake maintain stored procedures?

Yes, Snowflake maintains stored procedures. The stored procedure is the same as a function; it is created once and used several times. Through the CREATE PROCEDURE command, we can create it and through the “CALL” command, we can execute it. In Snowflake, stored procedures are developed in Javascript API. These APIs enable stored procedures for executing the database operations like SELECT, UPDATE, and CREATE.

31. How do we execute the Snowflake procedure?

Stored procedures allow us to create modular code comprising complicated business logic by adding various SQL statements with procedural logic. For executing the Snowflake procedure, carry out the below steps:

  • Run a SQL statement
  • Extract the query results
  • Extract the result set metadata

 32. Explain Snowflake Compression

All the data we enter into the Snowflake gets compacted systematically. Snowflake utilizes modern data compression algorithms for compressing and storing the data. Customers have to pay for the packed data, not the exact data.

Following are the advantages of the Snowflake Compression:

  • Storage expenses are lesser than original cloud storage because of compression.
  • No storage expenditure for on-disk caches.
  • Approximately zero storage expenses for data sharing or data cloning.

33. How to create a Snowflake task?

To create a Snowflake task, we have to use the “CREATE TASK” command. Procedure to create a snowflake task:

  • CREATE TASK in the schema.
  • USAGE in the warehouse on task definition.
  • Run SQL statement or stored procedure in the task definition.

34. How do we create temporary tables?

To create temporary tables, we have to use the following syntax:

Create temporary table mytable (id number, creation_date date);

35. Can you tell me how to access the Snowflake Cloud data warehouse?

Snowflake’s data warehouse can be accessed using the following ways:  

  • ODBC Drivers (a driver for connecting to Snowflake).
  • JDBC Drivers (a driver enabling a Java application to interact with a database).
  • Python Libraries (for creating Python applications that connect to Snowflake and perform standard operations).
  • Web User Interface (can be used for almost any task you can accomplish with SQL and the command line, such as: Creating and managing users and other account-level objects).
  • SnowSQL Command-line Client (Python-based command-line interface to connect to Snowflake from Windows, Linux, and MacOS).

36. Name the types of catches in Snowflake? 

  • Query Results Caching
  • Metadata Cache
  • Virtual Warehouse Local Disk Caching

37. What is Materialized view in Snowflake?

A materialized view in Snowflake is a pre-computed data set derived from a query specification. As the data is pre-computed, it becomes far easier to query materialized view than a non-materialized view from the view’s base table.

In simple words, materialized views are designed to enhance the query performance for common and repetitive query patterns. Materialized Views are primary database objects and speedup projection, expensive aggregation, and selection operations for queries that run on larger data sets.

38. Name a few Snowflake database objects that can be shared using Secure data sharing?

  • Tables
  • Secure views
  • External tables
  • Secure UDFs
  • Secure materialized views

39. What are the internal and external stages in Snowflake?

Following are the two stages supported by Snowflake to store data files and to load/unload data:

Internal Stage: Here the files are stored within the Snowflake account

External Stage: Here the files are stored in an external location. For instance AWS S3.

40. What is “Continuous Data Protection” (CDP) in Snowflake?

Continuous Data Protection (CDP) is an essential feature offered by Snowflake to protect data stored in snowflake from events like malicious attacks, human error, and software or hardware failovers. This CDP feature makes your data accessible and recoverable at all the stages of the data life cycle even if you lost it accidentally.

41. What is Snowflake Caching, and why should you care?

Ans: When you submit a new query, Snowflake checks for previously executed queries, and if one exists and the results are still cached, it uses the cached result set instead of running the query. The results of the Snowflake Cache are global and can be used by anyone.

42. What are the advantages of Snowflake Query Caching?

  • The default setting for the Results Cache is Automatic. You do not need to do anything special to use this feature.
  • For 24 hours, all results are cached.
  • There are no restrictions on space. Snowflake Cache (AWS/GCP/Azure) has infinite storage.
  • The cache is global and accessible to all WH and users.
  • As a result of caching, your BI dashboards will load faster.
  • As a result of caching, the compute cost is reduced.

43. When the underlying data changes, what happens to Cache results?

Ans: When the data in the underlying micro-partition changes, Snowflake Cache results are invalidated. Although the Snowflake Documentation contains more details, tests have shown that the result cache is reused unless the underlying data (or SQL query) has changed. Inserts, updates, and deletes that do not affect the underlying data are ignored. The result cache is used if the data in the micro-partitions remains unchanged, as demonstrated by a series of additional tests.

Finally, results are typically kept for 24 hours, though the clock is reset every time the query is re-executed, up to a maximum of 30 days, after which they query the remote disk. 

44. Can you explain the difference between Snowflake and AWS (Amazon Web Service)?

Ans: Snowflake and Amazon Redshift, for example, are cloud-based data warehouse platforms that offer excellent performance, scalability, and business intelligence tools. Both platforms offer similar core functionality, such as relational management, security, scalability, and cost-efficiency. Pricing, user experience, and deployment options are among the differences.

  • Snowflake is a complete SaaS (Software as a Service) offering, so no maintenance is required. AWS Redshift clusters, on the other hand, necessitate manual care.
  • Snowflake’s security model employs always-on encryption to enforce stringent security checks, whereas Redshift uses a more flexible, customizable approach.
  • Snowflake’s storage and computation are entirely independent, which means that storage costs are comparable to S3. On the other hand, AWS uses a Red Shift spectrum to get around this problem and allows you to query data that is directly available in S3. Despite this, it isn’t as perfect as Snowflake.