This post assumes that for reasons relating to data sovereignty, fiduciary or regulatory reasons in general that the:
– analytics platform will be underpinned by something which is cloud and on premises infrastructure agnostic, Kubernetes in other words.
– focal points of the Data Lake processing element will be Python and open source tools
– SQL Server 2022 S3 object virtualisation is the preferred technology for querying the Data Lake via a T-SQL surface area
– S3 is the preferred technology for storing the data in our Data Lake.
Read on for the high-level solution and stay tuned for more detailed answers.