(plural data lakes)
- A massive, easily accessible data repository built on (relatively) inexpensive computer hardware for storing "big data". Unlike data marts, which are optimized for data analysis by storing only some attributes and dropping data below the level aggregation, a data lake is designed to retain all attributes, especially so when you do not yet know what the scope of data or its use will be.
Pentaho CTO James Dixon is credited with coining the term "data lake". As he described it in his blog entry, "If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples."