Earlier today two new command line tools were announced for SQL Server, one an experimental Linux tools DBFS which enables access to live DMVs without using a UI like SSMS and secondly a tool that enables script generation of objects within SQL rather like the Generate SQL Scripts option in SSMS.
In this post I’m going to run through the installation of the script generator tool and provide a very quick demo. The reason I’m going through this is because in order to install the tool we need to use something called PIP. PIP is a package management system that enables us install and use packages written in Python. Yeah, Python again!
I’m pretty interested in DBFS, as it seems well-placed to make crusty Linux sysadmins happier with SQL Server, and that’s a big positive in my book.
I prefer to passively gather performance metrics – even if it’s a little convoluted and more work (for me). I don’t often need these metrics immediately available, so I execute queries similar to the query below and store the performance metrics in a table. I can (and do) build dashboards to track SSIS performance (perhaps I should blog about them…) using this passively-collected performance metrics.
As with Lookup Transformation messages, OLE DB Destinations in Data Flow Tasks record specific messages using the same format every time. Because of this (hard-coded) consistency, you and I can passively collect the number of rows written while executing packages in the SSIS Catalog using the (default) Basic logging level. We can use the following Transact-SQL query to collect this execution metadata post-execution:
Click through for the script.
If you need to add, remove, or replace hints from ad-hoc queries where you can’t change the code, plan guides can help. See a demo of removing a query hint from parameterized TSQL run from an application, and get tips on how to make your plan guides work in SQL Server.
The code from the demo is here. Links for more info are below the video. Have fun!
Click through to watch the video, or you can catch the podcast version.
Somehow I have become the R DBA at my job which I don’t mind, I plan on taking Microsoft’s Professional Program on Data Science to be familiar with it. But recently I’ve had to upload files to our R servers which the first time wasn’t too bad. Copy these files to six different servers but come the second time around it became apparent that the Predictive Analytics Manger was going to be asking me to do this more frequently than I wanted to to it manually. So I wrote a quick PowerShell function to take care of this added to our module we use in house. It unzips the file provided to the correct location. It does assume you have administrative rights to your server i.e. you can use the admin shares (c$) for example on the server. You will need to get the function Get-CMSHost from my Running SQL Scripts Against Multiple Servers Using PowerShell post to run the code below.
Click through for the script. This is particularly useful for deploying in-house packages and you don’t want to set up a miniCRAN.
In this video, I look at how to install and configure the May 2017 Preview of Power BI Report Server. Power BI Report Server has a new standalone install experience and this product allows for Power BI reports to be rendered in the web portal along with paginated reports.
This will get you started with the new version.
I was really excited about this preview until I realized that, for now, it only works for Analysis Services data sources.
I think it is important to highlight a couple of points, more specifically around the requirement of ADALSQL.DLL and proper setup of AD which I will highlight below and reference some links, please do this as it lays the foundation for you.
You need ADALSQL.DLL which is part of the latest SQL Server Management Studio (SSMS) to test access. This stands for Active Directory Authentication Library for SQL Server.
This goes through some of the issues Arun had setting everything up and provides workarounds and explanations.
The other day I had to update some records, in Production. I’m a firm believer of using explicit transactions and double checking things before committing a transaction. This helps ensure things go as expected. This also allows me a way to rollback the changes if they don’t. It happens.
However, this means that I have to COMMIT said explicit transaction. And not go to lunch without doing so.
Can you see my mistake? I bet you can.
Fortunately, it sounds like it wasn’t a critical problem. If you want to check for open transactions, Jack Vamvas has a couple methods.
I want to make my life easier by using temporal tables! Take my money and show me how!
I’m flattered by your offer, but since we are good friends I’ll let you in on these secrets for free.
First let’s create a temporal table. I’m thinking about starting up a car rental business, so let’s model it after that:
There are some places where temporal tables can get better (particularly around feeling more like a type 2 slowly changing dimension), but I’m pretty happy with this feature.
So what does it do? Per BOL
Returns the partition number into which a set of partitioning column values would be mapped for any specified partition function in SQL Server 2016.
So it basically tells us which partition any given row is in. This can be particularly handy at times. For example, if you want to know the min and max values of a column per partition.
Read on for a couple scripts which use $Partition.
A simple way to enhance diversity is to train models by using different machine learning algorithms. For example, adding a factorization model to a set of tree-based models (such as random forest and gradient boosting) provides a nice diversity because a factorization model is trained very differently than decision tree models are trained. For the same machine learning algorithm, you can enhance diversity by using different hyperparameter settings and subsets of variables. If you have many features, one efficient method is to choose subsets of the variables by simple random sampling. Choosing subsets of variables could be done in more principled fashion that is based on some computed measure of importance which introduces the large and difficult problem of feature selection.
In addition to using various machine learning training algorithms and hyperparameter settings, the KDD Cup solution shown above uses seven different feature sets (F1-F7) to further enhance the diversity. Another simple way to create diversity is to generate various versions of the training data. This can be done by bagging and cross validation.
I think there’s a pretty strong contrast between competitions and general practice, where you’re doing everything you can to eek out a higher prediction score in the competition, but in practice, you’re aiming to balance a “good enough” prediction with hardware/time requirements and code complexity, and so the model selection process can be quite different.