Press "Enter" to skip to content

Using Azure Purview to Catalog S3 Datasets

Gauri Mahajan crosses the clouds:

Data exists in various types of formats and hosted in equally varied type repositories depending on the format of data. With the advent of the cloud, generally, every public cloud provider like Amazon AWS, Microsoft Azure, and Google Cloud Platform provides a variety of data hosting avenues. A large-scale enterprise is very likely to have a multi-cloud footprint, where data is hosted on more than one cloud. As the data footprint on the cloud becomes larger, the need for a data catalog becomes increasingly important. Each cloud provider provides its format of the cloud-native data catalog. But even after employing a cloud provider-specific catalog, an enterprise may struggle to get a 360-degree view of data on the cloud. The reason being each cloud provider providers the ability to catalog data hosted on its own cloud only, which compels the end-users to reconcile or reference data from multiple catalogs, as the data catalog on the cloud may not integrate with cross-cloud data hosting services. As the client needs evolve, so do the cloud services. Azure Purview is the brand-new data catalog and governance-related service on the Azure cloud. It has introduced features to support cataloging data hosted on AWS Simple Storage Service (S3), which is typically considered the storage layer of data lake on AWS.

Read on to see how.