Press "Enter" to skip to content

Month: April 2017

Power BI Quick Measures

Paul Turley has a post on the new “Quick measures” functionality in Power BI:

I had added the new Quick Measures feature to Power BI Desktop in the Options/Preview page.  This, apparently disables Quick Calcs and enables Quick Measures.  Although it flustered me me for a minute in front of an audience, I found this to be welcome news.  So, what is Quick Measures?  It’s a DAX calculation generator that automatically writes useful measures.  Here’s how it works…  Start by right-clicking or clicking on the ellipsis for a numeric column in the Field list and choose Quick measure…

The Quick measures dialog prompts for the necessary fields, which might be different for each calculation.  The tool generates appropriately formatted DAX calculations.  It even includes conditions to raise errors if used in the wrong context.  There are currently 19 different calculation variations that the tool will generate.  Following are two examples.  Creating a Quick measure from my [Flights] measure and choosing the Airline field for categorization produces this calculation:

Looks to be interesting.  Read the whole thing.

Comments closed

Moving Files In Azure Data Factory

Meagan Longoria has a workaround for how you cannot move a file using Azure Data Factory:

But at this time ADF doesn’t support that. You can copy a file with a copy activity, but you cannot actually move (i.e., copy and delete).

Luckily, we had a workaround for our situation. If you tell ADF to copy data to a file that already exists in the specified location in the data lake, it will overwrite the existing file. We made sure the file name is always the same for each table in the staging area so there is always only one file per table.

Read on for the full details on this workaround.  Also, vote on this feedback item if you want the ability to move files instead of just copying them.

Comments closed

Transactional Replication Procedures

Drew Furgiuele offers up warnings when thinking about rolling your own transactional replication stored procedures:

In the above picture, we can see that it did replicate the execute statement, and that it affected 19,972 rows on the replica, and it only took 67ms! Sounds awesome, doesn’t it? Here’s a way to handle large batch updates at your publishers without overwhelming your replication setup. But before you go changing everything, you should probably understand that this has some really, really bad side effects if you’re not careful. Let’s look at three really big ones.

All in all, it’s a fairly risky move but might be worth the performance improvements.

Comments closed

Sharing Power BI Data

Steve Hughes is starting a series on Power BI security:

Another way to compartmentalize or secure data is using Workspaces within Power BI. Every user, including free users, have access to My Workspace which is the default location for deploying Power BI and other BI assets. However, you also have the option to create additional workspaces as deployment targets. These Group Workspaces usually have functional and security separation associated with them.

This post is a good overview of methods available for data sharing.

Comments closed

Sundry Thoughts On Change

Here are a few takes on the most recent T-SQL Tuesday.

Dave Mason is feeling overwhelmed:

The 2-year release cycle has been tough for some of us. Other outside forces have compounded the burden. DBAs have had to learn about virtualization and cloud computing. We’ve had to dip our toes in the No-SQL pool, and embrace automation like never before. Soon, if not already, we’ll be working with containers and supporting SQL Server on Linux. Yeah, it’s trite to talk about how “change is a constant”. (Is there anyone unaware of this?) But most seem to agree that the traditional role of the DBA is undergoing a drastic transformation. Others predict it will be completely unrecognizable, if not extinct, in a few years. What’s a DBA to do? Double down on SQL Server and stay the course? Or branch out to a different field like analytics, BI, or data science?

Riley Major says to use your noggin:

This makes sense. In business, you don’t want to be viewed as a cost center. You want to be on the revenue side of the equation. Whether IT is a competitive advantage or just plumbing depends on how it’s being used. If you’re just keeping the lights on, then you may be as critically important as the electricity itself, but you’re a commodity which can be replaced with a cheaper option. On the other hand, if you are providing insight which directs the company to profits, or if you are developing features which grow market share, your value is obvious.

So if you’re on the administration side of IT, you’re naturally more vulnerable in the eyes of the company. You make things possible, but you don’t actually do the things. You have to bring something unique to the table so that you can’t be as easily replaced with a service.

Kenneth Fisher says this is more of the same:

Unfortunately as powerful as these machines became they were expensive, aged out quickly, required knowledgeable people to maintain and sometimes our tasks required more computing power than we had on hand. So some smart people got together and created something new. The Cloud. Someone else maintaining the computers, replacing parts as needed, updating software etc. And then renting out storage and computing power. (If at this point you guessed that I’m saying there are some fairly obvious parallels between the old mainframes and the cloud, well, you are correct.)

Andy Galbraith ties this back to April Fools jokes re: SQL on Linux:

I quietly ignored it and went about my life and job, putting off the problem until later.
Time passed and Microsoft released a “public preview”/CTP of what they began calling “SQL Server vNext” for Linux, and it became more real.  Then they released another, and another – as of this writing the current CTP is version 1.4 (download it here).
I recently realized I hadn’t progressed past my original query:
WHAT DO I DO KNOW?

John Morehouse has a bat:

I work for a fairly slow moving financial institution.  This does not me we don’t adopt new technology but the leadership is very careful when deciding to move in a certain direction. Since we service rural America farmers, these decisions could have a huge impact on the ability of our customers to operate.    The cloud, at least from a database perspective, is not something that I think is even on the radar.  I believe that we will get there eventually, but not in the next year or two I would imagine.

Of course, this also means that I don’t get the shiny new cloud toys to play with either.  I have had the ability to work with the cloud some years ago on a side project, but that was very limited.  It was also at a time where Azure was fairly young and not as robust as it is today. Learning new skills around the Cloud is on my to-do list and one of these days I’ll get to it.  I think with the help of MSDN, it’s a lot easier to play around with new technologies.

There are a lot of good posts on this topic this month.

Comments closed

Query All Servers In A CMS Folder

Tracy Boggiano has a Powershell script for querying each SQL Server instance in a Central Management Server folder:

In this post I’m going to share a function (actually two) I use run scripts against multiple instances of SQL servers and run the data into a data table. I use this mainly for a replacement of the CMS feature of running against a folder and to put the data into a DataTable object which I output to a GridView that I can sort and filter any way I want which you can’t do in CMS.

Click through for the script.

Comments closed

Risk Vs Opportunity With Technical Advancement

Rob Farley on this month’s T-SQL Tuesday topic:

Does Automatic Tuning in Azure mean the end of query tuners? Does Self-Service BI in Excel and Power BI mean the end of BI practitioners? Does PaaS mean the end of DBAs?

I think yes. And no.

Yes, because there are tasks that will disappear. For people that only do one very narrow thing, they probably have reason to fear. But they’ve had reason to fear for a lot longer than Azure has been around. If all you do is check that backups have worked, you should have expected to be replaced by a script a very long time ago. The same has applied in many industries, from production lines in factories to ploughing lines in fields. If your contribution is narrow, you are at risk.

But no, because the opportunity here is to use the tools to become a different kind of expert. The person who drove animals to plough fields learned to drive tractors, but could use their skills in ploughing to offer a better service. The person who painted cars in a factory makes an excellent candidate for retouching dent repair, or custom paint jobs. Their expertise sets them apart from those whose careers didn’t have the same background.

Read the whole thing.  Rob is characteristically thoughtful.

Comments closed

Weights In Graphs

Angshuman Talukdar shows how to use neo4j to solve minimum weighted distance problems:

A sample dataset is created in Neo4j using the CREATE clause in Cypher as given in Query 1 (create clause in Cypher). This loads the data into Neo4j and generates the graph database as shown in Figure 2.

Neo4j has a lot of graph algorithms shipped with it as a package and those are accessible only from the JAVA API. Implementing some of these algorithms in Cypher is quite complex and time consuming. From Neo4j 3.x, the concept of user defined procedures had been introduced called APOC (Awesome Procedures On Cypher). Those are custom implementations of certain functionality, that can’t be (easily) expressed in Cypher itself. The APOC library consists of many (about 300) procedures to help with many different tasks in areas like data integration, graph algorithms or data conversion.

Graph databases aren’t common, but they can be very useful for certain questions like the one Angshuman solves.

Comments closed

R Plots In Power BI

Leila Etaati has a three-part series on displaying R visuals in Power BI.  Part 1 shows how to create a scatter plot:

so in the above picture we can see that we have 3 different fields that has been shown in the chart :highway and city speed in y and x axis. while the car’s cylinder varibale has been shown as different cycle size. However may be you need a bigger cycle to differentiate cylinder with 8 to 4 so we able to do that with add another layer by adding a function name

Part 2 shows how to use facet_grid to show multiple plots in one visual:

now I want to add other layer to this chart. by adding year and car drive option to the chart. To do that first choose year and drv  from data field in power BI. As I have mentioned before, now the dataset variable will  hold data about speed in city, speed in highway, number of cylinder, years of cars and type of drive.

I am going to use another function in the ggplot packages name “facet_grid” that helps me to show the different facet in my scatter chart. in this function, year and drv (driver) will be shown against each other.

Part 3 shows how to place charts on a map in R:

Now I have to merg the data to get the location information from “sPDF” into “ddf”. To do that I am going to use” merge” function. As you can see in below code, first argument is our first dataset “ddf” and the second one is the data on Lat and Lon of location (sPDF). the third and forth columns show the main variables for joining these two dataset as “ddf” (x) is “country” and in the second one “sPDF”  is “Admin”. the result will be stored in “df” dataset

Aside from my strong dislike of bar/pie charts on maps, this is good to know, particularly if there is not a built-in or customer Power BI visual to replicate something you can do in R.

Comments closed

Azure Data Lake Store Best Practices

Ust Oldfield provides recommendations on how to size and lay out files in Azure Data Lake Store:

The format of the file has a huge implication for the storage and parallelisation. Splittable formats – files which are row oriented, such as CSV – are parallelizable as data does not span extents. Non-splittable formats, however, – files what are not row oriented and data is often delivered in blocks, such as XML or JSON – cannot be parallelized as data spans extents and can only be processed by a single vertex.

In addition to the storage of unstructured data, Azure Data Lake Store also stores structured data in the form of row-oriented, distributed clustered index storage, which can also be partitioned. The data itself is held within the “Catalog” folder of the data lake store, but the metadata is contained in the data lake analytics. For many, working with the structured data in the data lake is very similar to working with SQL databases.

This is the type of thing that you can easily forget about, but it makes a huge difference down the line.

Comments closed