Press "Enter" to skip to content

Month: January 2017

Reading Extended Event Data From Powershell

Dave Mason builds a Powershell script to parse Extended Events information:

Powershell takes center stage for this post. Previously, I showed how to handle a SQL Server Extended Event in C# by accessing the event_stream target. We can do the same thing in PowerShell. The code translates mostly line-for-line from C#. Check out the last post if you want the full back story. Otherwise, continue on for the script and some Posh-specific notes.

Read on for the code.

Comments closed

Connect Items Galore

Aaron Bertrand has a series of Connect items of interest:

There are so many active Connect items, it’s difficult to choose just one, but after spending some time looking at the generate_series function in PostgreSQL, the one at the top of my list right now is this suggestion from Erland Sommarskog (blog):

Connect #258733 : Add a built-in table of numbers

I think it would be cheap and easy for SQL Server to provide a defaut table of numbers, always in memory, just like any other catalog view or DMV. This thing can use compression now, regardless of edition, so even a table with 2 billion rows will only take 13 MB, and should be easy to populate either at start-up or on first use. I could list out all of the potential uses for a numbers table, but they’re fairly well documented in the following sources (and most use cases don’t need anywhere near 2 billion values, so maybe the range could be defined using a sys.configurations or database-level SCOPED CONFIGURATION setting)

Click through for additional Connect items, and please vote on any Connect items which catch your eye.

Comments closed

The Importance Of Auditing

Louis Davidson has a parable about database design and systems auditing:

This brings me to my data question. If an order is processed in a store, but the expected data is not created, did that order ever occur?

Very often, the staff of a business are very focused on pleasing the customer, making sure they get their product, but due to software limitations, may not end up not capturing information about every sale in a satisfactory manner. Most of the blame I have seen lies in software that doesn’t meet the requirements of a customer, making capturing desired details tedious to achieve when the process is in the norm. Very often the excuse programmers give is that too much work of the work to build a system would need to be done for the atypical cases, but requirements are requirements, and it is generally essential that every action that occurs in a business is captured as data.

Read on for more.  My conjoined twin case is, how much information do we have about why users give up?  For example, if you have a three-part form, how many users get through part one, part two, and part three?  There’s some natural level of attrition, but if you see an abnormally low follow-through rate, that might indicate a bug or major issue.  Auditing is hard work, as you have to hit both sides of the problem at the same time.

Comments closed

Finding Clusters Of Queries Using R

Tomaz Kastrun shows how to use R to find clusters of queries which behave similarly:

So the R code said that, there are three clusters generating And I used medians to generate data around it. In addition I have also tested the result with Partitioning around medoids (which is opposite to hierarchical clustering) and the results from both techniques yield clean clusters.

Clustering models can be powerful for discovering commonalities, and that might help you find a number of queries which all behave in some sub-optimal way without having to trawl through every procedure’s code.

Comments closed

Submitting A Spark Job On HDInsight

Bharath Venkatesh shows different ways to run a Spark job on HDInsight:

From HDI 3.5 onwards, our clusters come preinstalled with Zeppelin Notebooks. Much like Jupyter notebooks, Zeppelin is a web-based notebook that enables interactive data analytics. It provides built-in Spark intergration that allows for:

  • Automatic SparkContext and SQLContext injection
  • Runtime jar dependency loading from local filesystem or maven repository. Learn more about dependency loader.
  • Canceling job and displaying its progress

This MSDN article provides a quick easy-to-use onboarding guide to help get acclimatized to Zeppelin. You can also try several applications that come pre-installed on your cluster to get hands on experience of Zeppelin.

Zeppelin is probably my favorite method, but there are good reasons to use all of these.

Comments closed

Understanding Data Integration Lifecycle Management

Andy Leonard explains DILM:

Data Integration Lifecycle Management (DILM) is not about data integration development.

DILM is about everything else:

  • Configurations Management
  • Version Management
  • Deployment
  • Execution

Although DILM is not about development, implementing DILM will impact the design of SSIS solutions.

This is the first part in a multi-part series, and covers some of the conceptual basicsbehind DILM.

Comments closed

Reporting Services Mobile Reports

Kathi Kellenberger shows off mobile reports in SQL Server Reporting Services:

Mobile Reports are dashboards that will run on most modern mobile devices as well as within the web portal. They are supported on IOS 9 and later, Android 4.4 or better, and Windows 10. To run Mobile Reports on these devices, the mobile Power BI application must be installed.

At first glance, they are simple to create. There is a new tool to use, the SQL Server Mobile Report Publisher. The tool will look familiar to you if you have worked with Datazen in the past. Microsoft purchased Datazen in 2015.

This is the first major Reporting Services update since 2008 (unless you consider sparkline support in R2 a major update), and could be a good business justification for upgrading to SQL Server 2016.

Comments closed

Columnstore Query Patterns

Niko Neugebauer gives a couple hints on how to make columnstore queries as fast as possible:

On my VM with 4 cores it takes 33 seconds to execute this query on SQL Server 2016 with Service Pack 1, while it burns almost 48 seconds of the CPU Time.
The relevant part of the execution plan can be found below, showing so many performance problems that this query is suffering, such as INNER LOOP JOIN, INDEX SPOOL, besides even worse part that is actually hidden and is identifiable only once you open the properties of any of the lower tree (left side of the LOOP JOIN), seeing that it all runs with the Row Execution Mode actually.

To show you the problem, on the left side you will find the properties of the sort iterator that is to be found in the lower (left) part of the LOOP Join that was executed around 770.000 times in the Row Execution Mode, effectively taking any chances away from this query to be executed in a fast way. One might argue that it might that it might be more effective to do the loop part in Row Mode, but given that we are sorting around 3.1 Million Rows there – for me there is no doubt that it would be faster to do it within a Batch Execution Mode. Consulting the last sort iterator in the execution plan (TOP N SORT), you will find that it is running with the help of the Batch Execution Mode, even though it is processing around 770.000 rows.

There’s some valuable information here.

Comments closed

Changing TDE Certificates

Tom Norman has a script to change the certificate used for TDE:

When you create TDE certificate by default, the certificate is just created for a year. Depending upon your auditor, you may be asked to change this TDE certificate yearly.  In this blog post, you can watch when your TDE certificate expires using policy-based management, https://www.mssqltips.com/sqlservertip/2388/internal-sql-server-certificates-have-expired-is-this-an-issue/.

Click through for the script.

Comments closed

Uses For Binary Data Types

Daniel Hutmacher explains what binary data types are and one use case:

Because binary values are essentially strings, they easily convert to and from character strings, using CAST or CONVERT. To convert the binary value of 0x41 to a plain-text character value, try

SELECT CAST(0x41 AS char(1));    --- 'A'

The binary value 0x41 is equivalent to decimal 65, and CHAR(65) is the letter “A”. Note that I haven’t placed any quotes around 0x41 – that’s because it’s a numeric value (albeit in hex notation) and not a string.

A couple use cases I’ve seen are creating hashes (SHA1 or MD5) for change detection, storing password hashes, and encrypted columns—Always Encrypted uses varbinary data types to store encrypted information, for example.

Comments closed