A large storage repository that holds data in their original format prior to being parsed and analyzed. The term is often associated with Hadoop, which was designed to hold huge amounts of data. See Hadoop.
A massive, easily accessible data repository built on (relatively) inexpensive computer hardware for storing "big data". Unlike data marts, which are optimized for data analysis by storing only some attributes and dropping data below the level aggregation, a data lake is designed to retain all attributes, especially so when you do not yet know what the scope of data or its use will be.
Origin of data-lake
Pentaho CTO James Dixon is credited with coining the term "data lake". As he described it in his blog entry, "If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples."