AWS simplifies data lakes with new formation service
12 August 2019 | 0
AWS Lake Formation a fully managed service to build, secure, and manage data lakes, that simplifies and automates many of the complex manual steps usually required to create a data lake, including collecting, cleaning, and cataloguing data, and securely making that data available for analytics.
Users can easily bring their data into a data lake from a variety of sources using pre-defined templates, automatically classify and prepare the data, and centrally define granular data access policies to govern access by the different groups within an organisation. This facilitates analysis of data using whatever choice of AWS analytics and machine learning services is required, including Amazon Redshift, Amazon Athena, and AWS Glue, with Amazon EMR, Amazon QuickSight, and Amazon SageMaker following, says AWS, in the next few months. There are no additional charges required to use AWS Lake Formation, says the maker, and users pay only for the underlying AWS services used.
AWS acknowledges that while Amazon Simple Storage Service (Amazon S3) is a popular service on which to build a data lake, the process can be unwieldy and time consuming.
The Lake Formation service , says AWS, significantly simplifies the process and removes the ‘heavy lifting’ from set-up process. It automates manual, time-consuming steps, such as provisioning and configuring storage, crawling the data to extract schema and metadata tags, automatically optimising the partitioning of the data, and transforming the data into formats like Apache Parquet and ORC that are ideal for analytics.
AWS Lake Formation cleans and deduplicates data using machine learning to improve data consistency and quality. To simplify data access and security, AWS Lake Formation provides a single, centralised place to set up and manage data access policies, governance, and auditing across Amazon S3 and multiple analytics engines. To reduce the time analysts and data scientists spend hunting down the right data set for their needs, AWS Lake Formation provides a central, searchable catalogue which describes the available data sets and their appropriate business use. Users can now easily access data from a single place, says AWS, setting up and using a data lake in days instead of months.
“Our customers tell us that Amazon S3 is the ideal place to house their data lakes, which is why AWS hosts more data lakes than anyone else – with tens of thousands and growing every day. They’ve also told us that they want it to be easier and faster to set up and manage their data lakes,” said Raju Gulabani, vice president, Databases, Analytics, and Machine Learning, AWS. “That’s why we built AWS Lake Formation, so customers can spend more time learning from their data and innovating, rather than wrestling that data into functioning data lakes. AWS Lake Formation is available today and we’re excited to see how customers use it as one of the building blocks for growing and transforming their businesses and customer experiences.”
AWS Lake Formation is available now in US East (Ohio), US East (N Virginia), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland) with additional regions coming soon.