Press "Enter" to skip to content

Curated SQL Posts

Finding Relational Data Sources In SSAS

Jens Vestergaard builds a Powershell script to figure out which relational database servers feed data to which Analysis Services cubes:

Whenever you are introduced to a new environment, either because you visit a new client or take over a new position from someone else, it’s always crucial to get on top of what’s going on. More often than not, any documentation (if you are lucky to even get hands on that) is out of date or not properly maintained. So going through that may even end up making you even more confused – or in worst case; misinformed.

In a previous engagement of mine came a request from the Data Architecture team. I was asked to produce a list of all servers and cubes running in a specific environment. They provided the list of servers and wanted to know which servers were hit by running solutions. Along with this information the team also needed all sorts of information on the connection strings from the Data Source Views, as well as which credentials were used, if possible.

If you’re dealing with a large number of cubes, this becomes even more useful.

Comments closed

Don’t Nest Views

Denny Cherry recommends against nested views:

Now there are plenty of reasons to use views in applications, however views shouldn’t be the default way of building applications because they do have this potential problems.

While working with a client the other week we had to unwind some massive nest views. Several of these views were nested 5 and 6 levels deep with multiple views being referenced by each view. When queries would run they would take minutes to execute instead of the milliseconds that they should be running in. The problems that needed to be fixed were all indexed based, but because of the massive number of views that needed to be reviewed it took almost a day to tune the single query.

Nested views is usually an indicator of somebody trying to perform OOP on a relational database, taking advantage of encapsulation.  One big performance problem with nested views is that at some point, the query optimizer gives up trying to optimize and simply pulls in all of the tables as many times as they appear.  Make the optimizer’s life easier and it will make your life easier.

Comments closed

Controlling Power BI Interactions

Reza Rad shows how to control what happens when you interact with a Power BI visual:

This behavior also can be set to None. For example let’s say you want to have a total of sales amount regardless of gender selection, and then a total of sales amount for the selected gender in slicer. To do this copy the SalesAmount Card Visual, and then click on Gender Slicer. click on Edit Interaction, and set one of the card visuals to None, the other one as default with Filter.

Count me among the people who did not know about this.

Comments closed

Downloading Power BI Reports

Ginger Grant notes that it is now possible to download Power BI reports as PBIX files:

Now that anyone can use a report which has been uploaded to the service as a starting point for a new report, there may be a decreased use of the template feature. Any report created on the desktop can be saved as a template, by selecting save on a desktop report file and changing the file type to a template. Unfortunately, templates do not contain links to the datasource used. The person creating the report must determine what data to use and if there was a dataset presently used which is refreshing the data, or create a dataset and it’s respective refresh features as part of creating a report.  Content packs provide the connections to datasets, but since the reports cannot be saved as a file for versioning, this feature is not often used instead of templates. Downloading the file and then modifying it is does resolve the issue as the starting point is then a working report with a connection to an existing dataset.

Read the whole thing.

Comments closed

LOB In Columnstore

Niko Neugebauer discusses LOB support in SQL Server vNext:

In the upcoming version of SQL Server (for the moment known as SQL Server vNext), Microsoft has finally announced the upcoming support for the LOBs within Columnstore Indexes – thus enabling the usage of the NVARCHAR(MAX), VARCHAR(MAX), and VARBINARY(MAX) data types on the tables with Columnstore Indexes that include those columns.

For the tests, I have decided to spin a Virtual Machine in Azure with an installation of the currently available CTP1 of the SQL Server vNext, which has a version 14.0.1.126.

Read the whole thing.  It’s hard to tell at this point if these are bugs, incomplete functionality, or what, so it’ll be interesting to track changes over the CTPs.

Comments closed

Displaying SSRS Indicators Horizontally

Derik Hammer shows how to display a set of indicators horizontally instead of vertically in a Reporting Services report:

If you are looking for a tabular report you can stop here. This does not look very nice on a dashboard, however. Now I will show you how to convert this into a grouping of colorful indicators.

Remove the [Rating] field from the bottom row and create an indicator by right-clicking on the placeholder and selecting Insert > Indicator. Then select the type you would like.

There are several steps, but Derik lays it all out nicely with screenshots.

Comments closed

The Data Science Delusion

Anand Ramanathan has a strong critique of “data science” as it stands today:

Illustration: Consider the sentiment-tagging task again. A Q1 resource uses an off-the-shelf model for movie reviews, and applies it to a new task (say, tweets about a customer service organization). Business is so blinded by spectacular charts [14] and anecdotal correlations (“Look at that spiteful tweet from a celebrity … so that’s why the sentiment is negative!”), that even questions about predictive accuracy are rarely asked until a few months down the road when the model is obviously floundering. Then too, there is rarely anyone to challenge the assumptions, biases and confidence intervals (Does the language in the tweets match the movie reviews? Do we have enough training data? Does the importance of tweets change over time?).

Overheard“Survival analysis? Never heard of it … Wait … There is an R package for that!”

This is a really interesting article and I recommend reading it.

Comments closed

Building A Hadoop Cluster

I have a post on building a five-node Hadoop cluster using Docker containers:

Notice how 3bd shows up for pretty much all of these services.  This is not what you’d want to do in a real production environment, but because we want to use Docker and easily pass ports through, it’s the simplest way for me to set this up.  If you knew beforehand which node would host which service, you could modify the run.sh batch script that we discussed earlier and open those specific ports.

After assigning masters, we next have to define which nodes are clients in which clusters.

Click through for a screenshot-laden walkthrough.

Comments closed

Learning About HCatalog

Adam Diaz explains the basics of HCatalog:

HCatalog, also called HCat, is an interesting Apache project. It has the unique distinction of being one of the few Apache projects that were once a part of another project, became its own project, and then again returned to the original project Apache Hive.

HCat itself is described in the documentation as “a table and storage management layer” for Hadoop. In short, HCat provides an abstraction layer for accessing data in Hive from a variety of programming languages.  It exposes data stored in the Hive metastore to additional languages other than HQL. Classically, this has included Pig and MapReduce. When Spark burst onto the big data scene, it allowed access to HCat.

Given HDInsight’s predilection toward WebHCat over WebHDFS, this does seem like a good thing to learn.

Comments closed

Understanding Status In DMVs

Ewald Cress looks at a number of DMVs and how they expose query status:

sys.dm_exec_sessions: status

This metric is completely disjunct from the above ones, and mostly reflects attributes of a CSession class instance. The respective values are derived through the following decision tree:

  • If the internal Boolean member m_fIsConnReset is set, return dormant

  • Else if a flag living outside of CSession itself is set, return preconnect (I’ll touch on the source of this mystery flag below)

  • Else if a flag within the CSession itself is set (indicating that it has been provided some work to do) return running

  • Else return sleeping

It’s interesting to see how something so very similar can have so many different understandings.

Comments closed