Let’s start with what services may require you to use a data gateway.
You will need a data gateway when you are using Power BI, Azure Analysis Services, PowerApps, Microsoft Flow, Azure Logic Apps, Azure Data Factory, or Azure ML with a data source/destination that is in a private network that isn’t connected to your Azure subscription with a VPN gateway. Note that a private network includes on-premises data sources and Azure Virtual Machines as well as Azure SQL Databases and Azure SQL Data Warehouses that require use of VNet service endpoints rather than public endpoints.
There are a few of them so check out Meagan’s post and take notes.
Let’s start with Azure Active Directory (AAD). In order to provision the resources in the diagram, your Azure subscription must already be associated with an Active Directory. AAD is Microsoft’s cloud-based identity and access management service. Members of an organization have a user account that can sign in to various services. AAD is used to access Office 365, Power BI, and Dynamics 365, as well as the Azure portal. It can also be used to grant access and permissions to specific Azure resources.
Meagan has several of these, so check it out.
When creating a new PowerApp using the Power BI integration, you get an additional data source – PowerBIIntegration that serves as the connection to the Power BI report. Whenever a filtering action occurs in the Power BI report, this information is available in this property.
During the PowerApps creation action I selected the action to add a new form which in the next step needs to get a connection to the Article table (which holds the additional article details).
Check out the entire series too.
It’s another terminology post! Earlier this week I was having a delightful lunch with Angela Henry, Kevin Feasel, Javier Guillen, and Jason Thomas. We were chatting about various new things. Partway thru our conversation Jason stops me because he thought I was talking about Power BI Dataflows when I was really talking about Azure Data Factory Data Flows. It was kind of a funny moment actually but it did illustrate that we have some overlapping terminology coming into our world.
So, with that inspiration, let’s have a chat about some of the new data flow capabilities in the Microsoft world, shall we?
Melissa clarifies the term “data flow” (or “dataflow” as the case may be) across several products in Microsoft’s BI stack. Worth the read.
Notably, the three Magic Quadrant Leaders except Tableau were ranked near the middle in all use cases. MicroStrategy, Birst, Sisense, TIBCO, YellowFin, Salesforce, SAS and a few other players excelled above the rest with high scores on this report. These results are a bit refreshing to see. Gartner Critical Capabilities scores seem to better align with Forrester’s rankings of Analytics and Business Intelligence Platforms and also my own understanding of several top offerings. I admit that I was surprised by these results. I was rarely – if ever – asked about several of the top scoring vendors over the past three years.
Read the whole thing, and then read the report.
For the first time in 3 years, Gartner dropped a significant amount of vendors off its quadrant. There were 24 vendors in the firm’s quadrant in 2016 and 2017. This year, the Magic Quadrant only lists 20 vendors…that’s a 16% quadrant reduction. Has the market shrunk?!
Not exactly: the market has evolved….and in a pretty predictable way actually. Take a look at our 3-year-movement analysis table below: we see a pretty consistent story, e.g. the big are getting bigger, some of the visionaries got absorbed (or disappeared) and few ‘trend-setters’ graduated up.
Read on for more. The leader quadrant pretty much fits my expectations in terms of the major vendors and their rank ordering.
In previous articles, we mentioned that most reporting performance issues need to be addressed during the data preparation stage, but many scenarios can’t be handled within the data source. For example, parallel data retrieval should be performed outside of the data source because its purpose is to increase I/O performance. To achieve the controllable buffer, the buffer information needs to be written to an external storage device, which can’t be handled within a data source. The asynchronous data buffering and loading data by random page number in building a list report can’t be handled by a data source. Even for an associative query over multiple datasets that a data source can deal with, it would be necessary to get it done outside the data source when multiple databases or a non-database source is involved and when the database load needs to be reduced. Obviously, these scenarios that are not able to be handled within a data source also can’t be handled by a reporting tool.
I would be concerned about implementation details overwhelming the general value of a data computing layer.
Currently, I think there are two main approaches to Data Warehouse Automation
- Data Warehouse Generation: You provide sources, mappings, datatype mappings etc.. The tool generates code (or artifacts).
- Data Warehouse Automation (DWA): The tool not only generates code / artifacts, but also manages the existing Data Warehouse, by offering continuous insight in data flows, actual lineage, row numbers, etc..
The difference might seem small, but IMHO is visible most clearly whenever changes occur in the Data Warehouse – the second class of tools can handle those changes (while preserving history). With the first class of tools provide you with the new structures, but you need to handle the preservation of history yourself (as you would’ve without DWA).
Read on for a contrast of these two approaches.
In order to set up and run an effective Big Data Hadoop project that provides reliable BI, your organization will need to adopt a new mindset that addresses not only the technology, but also the organizational EIM. You will need to conduct a comprehensive analysis of your business with the help of analysts, internal domain experts, and strategists to come up with robust and relevant business use cases. You will also need buy-in from management, and take company politics into consideration.
Your Big Data project needs to work with your existing BI tools, along with your security and monitoring systems. Data security needs to be addressed because standard Hadoop implementations have relatively poor security, and many organizations are wary of keeping all their data in one location.
I do agree with these reasons, though I’m a bit surprised that I didn’t see much about “classic” BI problems like the inability of the company to standardize on terminology or definitions (e.g., what the Kimball method describes as conformed dimensions), the desire to tackle too much of the problem at once, rapidly-changing source systems (and how BI team members tend to be the last to know that something has changed), etc.
The earlier you address data quality the better; the less time your end users spend on data wrangling, and the more they can focus on high value analytics. As your organization’s data infrastructure matures, migrating from spreadsheets to databases and data warehouses, data quality checks should be formally defined, documented, and automated. Exceptions should either be handled automatically during data intake using predefined business rules logic or require immediate user intervention to correct any errors.
Providing clean, centralized, and analytics-ready data to end users should not be a one-way process. By allowing end users to focus on high-value analytics, like data mining, network graphs, clustering, etc., they can uncover certain outliers and anomalies in the data. Effective data management should include a feedback loop to communicate these findings and, if necessary, incorporate any changes in the ETL processes, making centralized data management more dynamic and flexible.
The big question to ask is, “what problem are we trying to solve?” That will help determine the answer to many of the questions, including how you store the data, how you expose the data, and even which data you collect and keep.