![]() The ingestion layer is also responsible for delivering ingested data to a diverse set of targets in the data storage layer (including the object store, databases, and warehouses). It can ingest batch and streaming data into the storage layer. It provides the ability to connect to internal and external data sources over a variety of protocols. The ingestion layer is responsible for bringing data into the data lake. In the following sections, we look at the key responsibilities, capabilities, and integrations of each logical layer. These in turn provide the agility needed to quickly integrate new data sources, support new analytics methods, and add tools required to keep up with the accelerating pace of changes in the analytics landscape. A layered, component-oriented architecture promotes separation of concerns, decoupling of tasks, and flexibility. You can envision a data lake centric analytics architecture as a stack of six logical layers, where each layer is composed of multiple components. The following diagram illustrates the architecture of a data lake centric analytics platform. Logical architecture of modern data lake centric analytics platforms In this post, we first discuss a layered, component-oriented logical architecture of modern analytics platforms and then present a reference architecture for building a serverless data platform that includes a data lake, data processing pipelines, and a consumption layer that enables several ways to analyze the data in the data lake without moving it (including business intelligence (BI) dashboarding, exploratory interactive SQL, big data processing, predictive analytics, and ML). By using AWS serverless technologies as building blocks, you can rapidly and interactively build data lakes and data processing pipelines to ingest, store, transform, and analyze petabytes of structured and unstructured data from batch and streaming sources, all without needing to manage any storage or compute infrastructure. The exploratory nature of machine learning (ML) and many analytics tasks means you need to rapidly ingest new datasets and clean, normalize, and feature engineer them without worrying about operational overhead when you have to think about the infrastructure that runs data pipelines.Ī serverless data lake architecture enables agile and self-service data onboarding and analytics for all data consumer roles across a company. Onboarding new data or building new analytics pipelines in traditional analytics architectures typically requires extensive coordination across business, data engineering, and data science and analytics teams to first negotiate requirements, schema, infrastructure capacity needs, and workload management.įor a large number of use cases today however, business users, data scientists, and analysts are demanding easy, frictionless, self-service options to build end-to-end data pipelines because it’s hard and inefficient to predefine constantly changing schemas and spend time negotiating capacity slots on shared infrastructure. May 2022: This post was reviewed and updated to include additional resources for predictive analysis section.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |