The black line in the graph above is the amount of data read (actually the offset values showing where in the file the data is read from, which is the same thing as a running total when Power Query is reading all the data) from the Excel file; the green line is the amount of data read from the CSV file (the same data shown in the first graph above). A few things to mention:
Running Process Monitor while this second query was refreshing had a noticeable impact on its performance – in fact it was almost 20 seconds slower
The initial values of 80 million bytes seem to be where data is read from the end of the Excel file. Maybe this is Power Query reading some file metadata? Anyway, it seems as though it takes 5 seconds before it starts to read the data needed by the query.
There’s a plateau between the 10 and 20 second mark where not much is happening; this didn’t happen consistently and may have been connected to the fact that Process Monitor was running
The results were remarkable; check them out.
In this module you will learn how to use the Small Multiple Line Chart. The Small Multiple Line Chart allows you to display multiple mini line charts within a single visual that you can facet by an attribute value.
Click through for a short video as well as additional explanation and key takeaways.
Also consider the case of customer 19081. Even though it is only displayed in March, their Revenues YTD value is larger than Revenues. This is because the Revenues YTD measure considers the sum of previous months, even though Revenues may be lower than the threshold of 9,999.
Because the filter granularity is Year-Month-Customer, only the filtered combinations are also considered in the year total. This explains another unexpected result. The Revenues YTD computed in December is different from the one computed for the entire year – yet another unexpected behavior for a year-to-date calculation. At the month level, only customers with Revenues higher than 9,999 in December are considered, including all the months in their Revenues YTD calculation. However at the year level, all customers with revenue higher than 9,999 in at least one month are considered; their revenues for the entire year are summed to compute Revenues YTD regardless of the monthly filter applied to the Revenues measure.
Marco goes into detail regarding the nuances of filtering and also provides some good answers to common problems.
In the opening keynote and again in his sessions, Christian demonstrated Power BI reports on the taxi driver activity database with over a trillion rows of raw data. The larger dataset was in a Spark cluster, accessed using DirectQuery. Aggregated tables were stored in the in-memory model using the new composite model feature. As the data was explored in report visuals, the Power BI engine would seamlessly switch from tabular in-memory aggregate tables to DirectQuery source data in order to return low-level details. Composite models will allow mashing-up imported database and file-based data with an DirectQuery.
There are limits and complexities with these new features. You cannot mashup imported tables in a Power BI model based in a direct connection to SSAS, but enterprise-scale features in Power BI arguably may not steer a solution architect to select SSAS over Power BI for serious data modeling. With incremental data refresh, large model support, row-level security and many other “big kid” features, Power BI might be a preferable choice. I’m not ready to rule-out Analysis Services as the better option for most enterprise solutions – at least not in the near future, but Power BI is definitely heading in that direction.
Click through for several other features which help convince Paul that Power BI is threatening Analysis Services for enterprise data analysis solutions.
We can see that there are numerous threads, with a few taking considerable memory over others- The CefSharp.BrowserSubprocess can be a bit misleading- It’s Power BI using Chromium to render the visuals that are part of the Power BI Desktop that’s part of the current run. Chromium (CefSharp.BrowserSubprocess) subprocesses will always come in pairs, one for rendering and one for messaging.
In the Task Manager Details, we can see each of the PIDs that correspond with the processes IDs listed in the logs. By updating our viewable columns, (right click, choose “threads” and click OK) you can now view how many threads are associated with a given PID.
Read on for more.
Using layout images in Power BI has become a popular design trend. When I say layout images, I’m referring to background images with shapes around areas where visuals are placed. This is different from the new wallpaper feature that became available in the July release, which can be used to format the grey area outside your report page and extend the main color of background images.
Layout images can help with spacing and alignment within a report and can help create consistency across reports. They can also help create affordances, using consistent layout and design to make it obvious how users should interact with our reports.
I use layout images in some of my reports, but I don’t think they are necessary on every report. There are a couple of things to consider when using layout images.
Read on for an example of a good layout image versus a bad layout image as well as tips and tricks on how to create good layout images.
Chris Webb has a couple interesting posts on using Process Monitor, a venerable sysinternals tool, to troubleshoot issues with Power Query performance. First, Chris looks at how often Power Query opens and closes a file for data processing:
Troubleshooting Power Query performance issues in Power BI and Excel can be difficult because it’s a bit of a black box: there’s nothing in the UI to tell you what’s going on inside the Power Query engine and the diagnostic logs are very difficult to interpret. With relational data source like SQL Server you can use tools like SQL Server Profiler to see the queries that are being run by Power Query, and I blogged recently about using Fiddler to troubleshoot OData performance issues; but what about file-based data sources, which often present the most challenges regarding performance?
Process Monitor, a free tool from Microsoft, allows you to monitor file system activity in real-time and even having spent a limited amount of time using it I can already tell that it can provide a lot of information to help identify performance issues with file-based data sources. Take, for example, the scenario I described in my recent post on improving the performance of merge operations. In that post (which I suggest you read before you carry on) I mentioned that it looked as though the Power Query engine was reading data from one of the source files multiple times and Process Monitor confirms that this indeed the case.
This post is really just a quick follow-on from my post earlier this week on using Process Monitor to troubleshoot Power Query performance issues with file-based data sources, which I suggest you read before carrying on. I realised, after playing around with Process Monitor some more, that the ReadFile operation actually tells you how much data is being read from a file when a Power Query query is running. For example, here’s a sample of some of the ReadFile operations captured while running the unoptimised version of the query I talked about in my last post:
I get the feeling that really learning a few sysinternal tools can be extremely useful, like how learning Wireshark can make you a better DBA.
A question I often here is this: Which tool should I use, Tableau or Power BI? The truth is: They are not mutually exclusive.
Tableau is great at business mysteries: ill-defined questions where you have to surf the data for results. Power BI is particularly great at modelling and cleaning the data, with clean, crisp data visualisation and the ability to use custom and open-source data visualizations. This blog isn’t aimed at the technical user, but at the analyst who needs to get information out quickly. I will do another post, aimed at the geeks, another time.
I am all about embracing the power of And. The downside is, my corporate budget went bankrupt just thinking about the cost of both of them together…
The July 2018 preview of Power BI Desktop delivers two killer preview features that solidify the Power BI position as the best data modeling tool on the market. First, Microsoft relaxes the Power BI relationship limitations by letting you create M:M joins between two tables. Second, you can now create a composite (hybrid) data model with different storage modes, such as from an SQL Server database configured for DirectQuery and from an imported text file. Without reiterating the documentation, here are some important notes to keep in mind based on my pre-release testing.
But read the whole thing, which includes some limitations around hybrid data models.
Do I really want to go through all this data by hand? BI is a reporting tool, so what if I bring them into Power BI? Let’s start with the first MS Mashup Container log-
In Power BI, click on Get Data –> Text and change the file type to “All Files” in the explorer and go to the directory that contains the trace files:C:\Users\<user>\AppData\Local\Microsoft\Power BI Desktop\Traces\Performance
Remember that you will need to have “hidden items” set to be displayed to browse down to this folder. Choose the files you wish to load in the directory and Power BI and choose a Customer delimiter of a quotes, (“) to separate the file. This will load a file that will have a few columns you’ll need to remove that contain data like colons, nulls and other syntax from the file
I like it.