AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |
Back to Blog
Snowflake tasks11/4/2022 ![]() ![]() It's good practice to initially load data to a transient table which balances the need for speed and resilience and simplicity and reduced storage cost. Ingestion & Landing: Involves loading the data into a Snowflake table from which point it can be cleaned and transformed. In common with all analytics platforms, the data engineering phases include:ĭata Acquisition: Which involves capturing the raw data files and storing them on cloud storage including Amazon S3, Azure Blob or GCP storage. This can be used to enrich existing transactions with additional externally sourced data without physically copying the data. Using either the Snowflake Data Exchange or Marketplace provides instant access to data across all major cloud platforms (Google, AWS or Microsoft) and global regions. #SNOWFLAKE TASKS SOFTWARE#SaaS and Data Applications: This include existing Software as a Service (SaaS) systems for example ServiceNow and Salesforce which have Snowflake connectors in addition to other cloud based applications.ĭata Files: Include data provided from either cloud or on-premises systems in a variety of file formats including CSV, JSON, Parquet, Avro and ORC which Snowflake can store and query natively.ĭata Sharing: Refers to the ability for Snowflake to seamlessly expose read-only access to data on other Snowflake Accounts. This can include data from Internet of Things (IoT) devices or Web Logs in addition to Social Media sources. Streaming Sources: Unlike on-premise databases where the data is relatively static, streaming data sources are constantly feeding in new data. These can (for example) include billing systems and ERP systems used to manage business operations. On Premise Databases: Which include both operational databases which generate data and existing on-premise data warehouses which are in the process of being migrated to Snowflake. Typically, data is stored in S3, Azure or GCP cloud storage in CSV, JSON or Parquet format. The diagram above shows the main categories of data provider which include:ĭata Lakes: Some Snowflake customers already have an existing cloud based Data Lake which acts as an enterprise wide store of historical raw data used to feed both the data warehouse and machine learning initiatives. There's a number of components and you may not use all of them on your project, but they are based upon my experience with Snowflake customers over the past three years. The diagram below illustrates the overall system architecture on Snowflake used to build complex data engineering pipelines. The steps involved include data acquisition, ingestion of the raw data history followed by cleaning, restructuring, enriching data by combining additional attributes and finally preparing it for consumption by end users. For the purposes of this article, Data Engineering is the process of transforming raw data into useful information to facilitate data driven business decisions. #SNOWFLAKE TASKS CODE#Most often the terms ETL or ELT (Extract Transform and Load) are used interchangeably as a short code for data engineering. ![]() The following properties are applicable to a Snowflake Task object.In this article we will describe the overall data transformation landscape on Snowflake, explain the steps and the options available, and finally summarise the best practices learned from over 50 engagements with Snowflake customers. ![]()
0 Comments
Read More
Leave a Reply. |