A large storage repository that holds data in their original format prior to being parsed and analyzed. The term is often associated with Hadoop, which was designed to hold huge amounts of data; for example, a data lake may hold all the data in an organization.
Lake and Warehouse
A data lake contains both structured and unstructured data. However, a "data warehouse" contains structured data that have been examined, cleansed (deduped) and is available for analytics. Storing data in a lake is faster than in a warehouse.
Lakehouse = Lake + Warehouse
A data "lakehouse" combines unstructured data from a data lake and structured data from a data warehouse along with analytical warehouse tools. For example, high-speed SQL searches in warehouses are used in lakehouses. The lakehouse term was coined as cloud providers began to include warehouse functions in data lakes. See
Hadoop,
data warehouse and
deduplication.