Press "Enter" to skip to content

Month: February 2019

How’s My Database?

Daniel Janik has a Windows app for you:

There are actually about 40 things it checks for.

Current limitations are that queries with a cursor or temp table are not analyzed. There’s also a bug where the missing indexes and warnings appear on the wrong node/operator. Since the tool is using estimated plans at the moment, it may not be as accurate.

I’m planning on a few new features in the next month to add feeding the utility a query plan and displaying the original query. I’m also planning on adding history and the ability to execute a query from the tool. Before we get to those we need to fix some known bugs though. I’m hoping that you. Yes! you can help me identify other bugs to make this a great tool for the SQL community.

The product is in beta, so check it out and send Daniel some feedback.

Comments closed

When Differential Backups Grow Larger Than Fulls

Kenneth Fisher notes that differential backups can end up being larger than full backups of the same database:

The thing about DBA Myths is that they are generally widespread and widely believed. At least I believed this one until I posted What’s a differential Backup?and Randolph West (b/t) pointed out that my belief that differential backups can’t get larger than full backups was incorrect. In fact, differential backups (like FULL backups) contain enough transaction log information to cover transactions that occur while the backup is taking place. So if the amount of data that needs to be backed up combined with transactions requires more space than just the data ….

Read on for a demonstration.

Comments closed

Native Math Libraries And Spark ML

Zuling Kang shares with us how we can use native math libraries in netlib-java to speed up certain machine learning algorithms in Apache Spark:

Spark’s MLlib uses the Breeze linear algebra package, which depends on netlib-java for optimized numerical processing.  netlib-java is a wrapper for low-level BLASLAPACK, and ARPACK libraries. However, due to licensing issues with runtime proprietary binaries, neither the Cloudera distribution of Spark nor the community version of Apache Spark includes the netlib-java native proxies by default. So without manual configuration, netlib-java only uses the F2J library, a Java-based math library that is translated from Fortran77 reference source code.

To check whether you are using native math libraries in Spark ML or the Java-based F2J, use the Spark shell to load and print the implementation library of netlib-java. The following commands return information on the BLAS library and include that it is using F2J in the line, “com.github.fommil.netlib.F2jBLAS,” which is highlighted below:

In the examples here, you can get about a 2x difference using the native math libraries versus without, so although that’s not an order of magnitude difference, it’s still nothing to sneeze at.

Comments closed

Kafka Cruise Control Frontend

Naresh Kumar Vudutha announces the Kafka Cruise Control Frontend:

For those that may be unfamiliar, Cruise Control features include:

1. Kafka broker resource utilization tracking
2. The ability to query the latest replica state (offline, URP, out of sync) from brokers
3. Goal-based resource distribution
4. Anomaly detection with self-healing
5. Admin operations on Kafka (add/remove/demote brokers, rebalance cluster, run PLE)

In this post, we will take a look at the frontend for Cruise Control, which provides a birds-eye view of all the Kafka installations and provides a single place to manage all of them.

That’s a lot of functionality in one tool.

Comments closed

Case Classes In Scala

Shubham Dangare explains what case classes are in Scala:

Case class is scale way to allow pattern matching on an object without requiring a large amount of boilerplate. All you need to do is add a single case keyword modifier to each class that you want to pattern matching using such modifier makes scala compiler add some syntactic conveniences to your class and compiler add companion object(with the apply method)
Adds factory method with the name of the class this means that for instance, you can write StringValue(“X”) to construct a StringValue object instead of using new StringValue(“X”)

Given how useful case classes are in Spark, it’s good to know how they operate. For more background on the topic, Alessandro Lacava has a post from a few years back describing the topic well.

Comments closed

SSIS Error “Deserializing The Package”

Andy Leonard troubleshoots an odd error in SSIS:

Exception deserializing the package “Operation is not valid due to the current state of the object.”. (Microsoft.DataTransformationServices.VsIntegration)

As a professional consultant who has been blogging about SSIS for 12 years and authored and co-authored a dozen books related to Microsoft data technologies, my first response was:
“Whut?!”

That is a reasonable first response. Fortunately, Andy also had a second response which was more helpful in finding the root cause.

Comments closed

Saving An ADF Pipeline As A Template

Rayis Imayev shares with us how you can save an Azure Data Factory pipeline as a template:

Azure Data Factory (ADF) provides you with a framework for creating data transformation solutions in the Microsoft cloud environment. Recently introduced Template Gallery for ADF pipelines can speed up this development process and provide you with helpful information to create additional activity tasks in your pipelines.

We naturally long to seek if something standard can be further adjusted. That custom design is like ordering a regular pizza and then hitting the “customize” button in order to add a few toppings of our choice. It would be very impressive then to save this customized “creation” for future ordering. And Azure Data Factory has a similar option to save your custom data transformation solutions (pipelines) as templates and reuse them later.

Click through to see how you can do just that.

Comments closed

No Type Equivalence In M

Imke Feldmann notes an oddity in types in Power Query:

But this function will not return any matches. I also tried out a (potentially) slower version using Table.SelectColumns(Types, each [Value] = x[Types]) – but still no match. 

What I found particularly frustrating here was, that in some cases, these lookups or filters on type-columns worked.

That behavior seems odd to me. Imke shares a link from Microsoft which explains that the behavior occurs, but the why behind it eludes me.

Comments closed

Saving To Excel From Azure Data Studio

Bob Pusateri shows us how you can export to Excel from Azure Data Studio:

In SQL Server Management Studio, there’s no single-step way to save a result set to Excel. Most commonly I will just copy/paste a result set into a spreadsheet, but depending on the size of the result set and the types of data involved, that doesn’t always play nicely.

But Azure Data Studio does it WAY better, trust me. If you want that result set in a spreadsheet, just save it as one and poof – you have an Excel file!

Considering that Excel is the most popular BI tool, it makes sense to support it.

Comments closed