![]() ![]() Here’s a very simple example – if you count the number of customers coming into your shop each day and store that data as a simple number, those data points are only ever going to tell you one thing. ![]() Increasingly, businesses are looking to unstructured data to inform their data-driven operations and decision-making simply because of the richness of the insights that can be extracted from it. So who is the data lakehouse architecture for? One key group of users is very likely to be organizations that are looking to take the next step in their analytics journey by graduating from BI to AI. By identifying and extracting features from the data, it can effectively be structured, allowing it to be cataloged and indexed just as if it was nice, tidy structured data.įor example, part of this metadata extraction might include using computer vision or natural language processing algorithms to understand the content of picture, text, or voice files that are dumped as raw, unlabelled data into the lakehouse. These data lakehouses might make use of intelligent metadata layers – that act as a sort of "middle man" between the unstructured data and the data user in order to categorize and classify the data. And those data users might be data scientists or, increasingly, workers in any number of other roles that are increasingly seeing the benefits of augmenting themselves with analytics capabilities. This means that data users can access the information more quickly and start putting it to work. Where organizations do use both, then generally data in the warehouse feeds BI analytics, while data in the lake is used for data science – which could include artificial intelligence (AI) such as machine learning -and storage for future, as-of-yet undefined use cases.ĭata lakehouses enable structure and schema like those used in a data warehouse to be applied to the unstructured data of the type that would typically be stored in a data lake. This means that businesses that can benefit from working with unstructured data (which is pretty much any business) only need one data repository rather than requiring both warehouse and lake infrastructure. Just like a data lake, a data lakehouse is built to house both structured and unstructured data. There are no prizes for guessing that the fundamental idea behind this approach is to take the best concepts from both the data warehouse and data lake models and put them together while trying to eliminate the worst concepts of both models! ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |