The article that I wrote earlier this week about the shared dimension had a lot of interest, and I’m glad it helped many of you. So I thought better to write about the basics of modeling even more. In this article, I will be focusing on a scenario that you have all faced, however, took different approaches. Is it good to have too many dimension tables? can you combine some of those tables together to build one flatten dimension table? how much should you flatten it? should you end up with one huge table including everything? In this article, I’m answering all of these questions and explaining the scenarios of combining dimensions, as usual, I explain the model in Power BI. However, the concepts are applicable to any other tools. If you like to learn more about Power BI; read Power BI book from Rookie to Rock Star.
Given how closely the ideal Power BI data model matches the Kimball model, Reza’s advice makes perfect sense.
For security purposes, Databricks Apache Spark clusters are deployed in an isolated VPC dedicated to Databricks within the customer’s account. In order to run their data workloads, there is a need to have secure connectivity between the Databricks Spark Clusters and the above data sources.
It is straightforward for Databricks clusters located within the Databricks VPC to access data from AWS S3 which is not a VPC specific service. However, we need a different solution to access data from sources deployed in other VPCs such as AWS Redshift, RDS databases, streaming data from Kinesis or Kafka. This blog will walk you through some of the options you have available to access data from these sources securely and their cost considerations for deployments on AWS. In order to establish a secure connection to these data sources, we will have to configure the Databricks VPC with either one of the following two available options :
Read on for those two options.
Service (application) exposes the JMX metrics at some port which will be captured by Jolokia java agent. Then Jolokia exposes those metrics at some port which is easily accessible through a rest endpoint (we call it Jolokia URL). Then we have JMX2Graphte which polls the metrics from Jolokia URL and push it to Graphite. Then Grafana reads the Graphite metrics and creates a beautiful dashboard for us along with the alerts.
So this is the working of the proposed monitoring solution. Now let’s discuss the components of the monitoring solution.
There’s a bit of code/configuration in here as well, so check it out.
Simply put, microservices are a software development method where applications are structured as loosely coupled services. The services themselves are minimal atomic units which together, comprise the entire functionality of the entire app. Whereas in an SOA, a single component service may combine one or several functions, a microservice within an MSA does one thing — only one thing — and does it well.
Microservices can be thought of as minimal units of functionality, can be deployed independently, are reusable, and communicate with each other via various network protocols like
HTTP(More on that in a moment).
Read the whole thing. I have a love-hate relationship with these but it’s a pattern worth understanding.
First and most importantly, I updated the Power BI logo in the diagram to the latest version of the logo!
Secondly, I included Power BI Dataflows in the diagram tagged #6. Power BI Dataflows are used to ingest, transform, integrate, and enrich big data by defining data source connections, ETL logic, refresh schedules, and more. Data is stored as entities in the Common Data Model in Azure Data Lake Storage Gen2. Dataflow entities can be consumed as a data source in Power BI and by using Power BI Desktop. Read more about Dataflows here.
Click through for a full changelog and a link to download the architecture diagram and legend.
There is an impedance mismatch between model development using Python, its tool stack and a scalable, reliable data platform with low latency, high throughput, zero data loss and 24/7 availability requirements needed for data ingestion, preprocessing, model deployment and monitoring at scale. Python in practice is not the most well-known technology for these requirements. However, it is a great client for a data platform like Apache Kafka.
Click through for the full argument as well as where Kafka can help mitigate some of the issues.
Database cloning is a key aspect of the SSRS scale out architecture, with database clones providing each container a complete set of databases. Two or more VMs operated behind a load balancer delivers a highly available and scalable reporting service. This article focuses on Windows SQL Server containers and Windows Virtual Hard Drive (VHD) based cloning, but the same architecture can support SQL Server Linux containers or conventional instances (Windows or Linux). Redgate SQL Clone, for example would support SQL Server instances. Other options include the use of storage arrays instead of Windows VHD based clones. The trade-offs between SQL containers and instances, and between VHDs and storage arrays are covered in separate sections below.
The combination of SSRS containers with database cloning is appealing for simplicity and operational savings. SSRS containers are also drawing interest as part of public cloud strategies, as SSRS containers can be integrated with AWS RDS or SQL Azure databases to provide a horizontally scalable reporting solution.
This is a bit more complex than Reporting Services scale-out with Enterprise Edition, but if you’re on Standard Edition and can’t use scale-out, it’s an interesting alternative.
Beginning of last year, we started to develop a chat bot demo. The idea was to integrate the chat bot into one of the big applications as a replacement to FAQ. Users can ask questions to bot thus avoiding obvious support tickets in the future.
Things went well. We got appreciation on the demo and started moving to production. About half way, things started turning south. The demo chat bot application used Bot SDK V3. It had voice recognition enabled which allow users to talk to it and get the response back in voice. During the demo we used Azure Bing Speech API. But later before the production, we got the notice that the service is obsolete and will be retired mid 2019. Another surprise was the introduction of Bot SDK V4 which is entirely different that Bot SDK V3. Something like AngularJS v/s Angular.
The major services tend to give you some time to switch over—in this case, they had 10 months to make a move. But when dealing with online services versus locally installed products, there’s always a risk that the service you’re calling won’t be there, and depending upon how critical that service is, it can have a major effect on your ability to function if it disappears one day. That’s definitely not a reason to ignore these services; it’s a reason to have a backup plan in place.
These days I still design new databases from scratch with pen and paper (or iPad and Apple Pencil), where the entity relationship diagram (ERD) is rudimentary and crows’ feet relationships are badly-scrawled. But it got me wondering which database modelling tools are on the market today (commercial and free).
My ideal tool should be able to design a new database from scratch and generate creation scripts in T-SQL without failing over common issues like referential integrity and dependencies. More importantly though, it should be able to reverse-engineer a database (like Microsoft Visio used to be able to). This is extremely useful for consulting engagements when I need to get a picture in my head of the database I’m looking at. This was the one place I’ve used the Database Designer in SSMS more than I had initially remembered.
Randolph also mentions SQL Database Modeler, which I used on a consulting engagement where I wanted to replicate Visio’s database reverse engineer functionality.
Let’s start with Azure Active Directory (AAD). In order to provision the resources in the diagram, your Azure subscription must already be associated with an Active Directory. AAD is Microsoft’s cloud-based identity and access management service. Members of an organization have a user account that can sign in to various services. AAD is used to access Office 365, Power BI, and Dynamics 365, as well as the Azure portal. It can also be used to grant access and permissions to specific Azure resources.
Meagan has several of these, so check it out.