Brent Ozar asks (and answers) an interesting question:
Do Disabled Indexes Affect Missing Index Recommendations?
I’m so glad you asked! Let’s take a look.
Read on to learn if Betteridge’s law of headlines holds.
Comments closedA Fine Slice Of SQL Server
Brent Ozar asks (and answers) an interesting question:
Do Disabled Indexes Affect Missing Index Recommendations?
I’m so glad you asked! Let’s take a look.
Read on to learn if Betteridge’s law of headlines holds.
Comments closedKenneth Fisher shows how to generate music with Powershell:
Happy holidays everyone! Ok, this isn’t even remotely related to SQL Server but it sounded fun. I found someone playing music using Powershell (I lost the link, sorry) and thought Christmas Music!
Not the most efficient way to generate music, but it’ll do.
Comments closedWith what I have learned from Pester, it seems to be best to test each command on its own. That got me to thinking, what if I walk into a large environment and I want to see an enterprise summary of best practices? I started working on a few different versions with the ultimate goal of using dbatools function Get-DbaRegisteredServer. I have included a few examples of Test functions that include best practices, as well as some default database best practices all from dbatools.
In my first run, I wanted to make sure that I can pass multiple servers with multiple tests and came up with the below script. You will notice the top section that uses the Get-Dba functions, I had to supply my own values of the properties I am testing. In the bottom section, you will notice that dbatools Test-Dba functions have the IsBestPractice property already and that is all you have to test against.
Read on for a sample of what you can do.
Comments closedGinger Keys troubleshoots a connectivity failure in a two-node Availability Group:
The AlwaysOn Health Events were consistent with the SQL Server Logs. I discovered that the Windows Server Failover Cluster had been down (but was back online), and that the AlwaysOn Group had failed over and back. But the culprit to my secondary replica and databases being disconnected was an endpoint issue. I found in my SQL Server logs the service account running AlwaysOn had for some reason lost its permissions to connect to the endpoint:
Click through for the entire troubleshooting process as well as the solution.
Comments closedIlknur Kaynar Kabul explains why it’s important that your data science models be interpretable:
Some machine learning models are simple and easy to understand. We know how changing the inputs will affect the predicted outcome and can make justification for each prediction. However, with the recent advances in machine learning and artificial intelligence, models have become very complex, including complex deep neural networks and ensembles of different models. We refer to these complex models as black box models.
Unfortunately, the complexity that gives extraordinary predictive abilities to black box models also makes them very difficult to understand and trust. The algorithms inside the black box models do not expose their secrets. They don’t, in general, provide a clear explanation of why they made a certain prediction. They just give us a probability, and they are opaque and hard to interpret. Sometimes there are thousands (even millions) of model parameters, there’s no one-to-one relationship between input features and parameters, and often combinations of multiple models using many parameters affect the prediction. Some of them are also data hungry. They need enormous amounts of data to achieve high accuracy. It’s hard to figure out what they learned from those data sets and which of those data points have more influence on the outcome than the others.
This post reminds me of a story I’d heard about a financial organization using neural networks to build accurate models, but then needing to decompose the models into complex decision trees to explain to auditors that they weren’t violating any laws in the process.
Comments closedMostafa Mokhtar, et al, explain a few methods for skipping unneeded data in Impala queries:
Each Apache Parquet file contains a footer where metadata can be stored including information like the minimum and maximum value for each column. Starting in v2.9, Impala populates the
min_value
andmax_value
fields for each column when writing Parquet files for all data types and leverages data skipping when those files are read. This approach significantly speeds up selective queries by further eliminating data beyond what static partitioning alone can do. For files written by Hive / Spark, Impala only reads the deprecatedmin
andmax
fields.The effectiveness of the Parquet
min_value
/max_value
column statistics for data skipping can be increased by ordering (or clustering1) data when it is written by reducing the range of values that fall between the minimum and maximum value for any given file. It was for this reason that Impala 2.9 added theSORT BY
clause to table DDL which directs Impala to sort data locally during anINSERT
before writing the data to files.
Even if your answer is “throw more hardware at it,” there eventually comes a point where you run out of hardware (or budget).
Comments closedIf you want to follow along (go on) then you should head over to Neil’s site, download the excel file and take a look at the “how to” guide on the same page. Existing R users are already likely to be shuddering at all the manual manipulation required.
For the first attempt, I followed Neil’s approach pretty closely, resulting in a lot of code to sort and group, although ggplot2 made the actual plotting much simpler. I shared my very first attempt ( produce with barely any ggplot2 code) which was quite good, but there were a few issues – the ins/ outs being coloured blue instead of grey, and overplotting of several points.
Click through for code and explanation. H/T R-Bloggers
Comments closedReza Rad covers one method of sharing Power BI content with users:
What dashboard sharing as the name of it explains is based on a dashboard. You can only share a dashboard with this method, not a report. Consider that you have a dashboard like below screenshot, and you want to share it. There is a share link at the top right corner of the dashboard.
Dashboard sharing have very few options to set and is very simple to configure. You just need to add the email address of people whom you want to share this report. You can also write a message for them to know that this report is shared with them.
Click through for more information. Note that this is a paid feature.
Comments closedTracy Boggiano has a script which splits out large files in a filegroup into a smaller set of files:
The solution I offer allows you to break your files into any size you want by rebuilding your indexes. You will need some disk space for it to create the new files while it runs the process then it will drop the large file. This will also take up some space in your transaction log so if you not running your transaction log backups frequently enough you could have a lot of disk space taken up by that so watch out for that. All the code can be downloaded from my github repository here.
Read on for an explanation of the entire process.
Comments closedMichael J. Swart points out a bit of trickery with DATEDIFF:
I assumed that the
DATEDIFF
function I wrote worked this way: Subtract the two dates to get a timespan value and then return the number of seconds (rounded somehow) in that timespan.But that’s not how it works. The docs for
DATEDIFF
say:“Returns the count (signed integer) of the specified datepart boundaries crossed between the specified startdate and enddate.”
There’s no rounding involved. It just counts the ticks on the clock that are heard during a given timespan.
Read the whole thing.
Comments closed