Press "Enter" to skip to content

Day: November 19, 2018

Deploying Cloudera Enterprise On Azure

Xavier Morera announces a new Cloudera course:

You will start by learning the Microsoft Azure services required to deploy a secure, elastic, Cloudera Enterprise cluster. These core services include security, networking, virtual machine management, and storage, just to name a few.

Then, you’ll learn best practices and patterns for cloud-based clusters, including tips and caveats for security and workload management.

Next, you’ll learn how to bootstrap a cluster using Cloudera Manager, which allows you to deploy a cluster on premises or in the cloud. The module covers how to deploy both development (Path A) and production-grade (Path B) clusters.

This is a free course, so if you’re looking for a way to fill your Thanksgiving weekend, this is definitely an option.

Comments closed

Python In SQL Server Reporting Services

Tomaz Kastrun shows how we can visualize results from Python models in SQL Server Reporting Services:

As we have created four different models, we would also like to have the accuary of the model visually represented using SSRS.

Showing plots created with Python might not be as straight forward, as with R Language.

Following procedure will extract the data from database and generate plot, that can be used and visualized in SSRS.

Tomaz shows us examples of displaying data as well as visuals generated in Python.

Comments closed

Running Python-Based ML Tasks In Excel

Tony Roberts shows off some of the functionality of PyXLL:

Once we’ve done the hard work of building and testing a model we need to put it to some use! Excel is a great front-end tool for playing with data interactively. It’s used virtually everywhere and so being able to deliver your model in Excel to non-developer users massively opens up opportunities for how it can be used in your business. Even if the model is being used as part of a real-time or batch system, being able to call the model interactively can be really helpful when trying to understand the behaviour of a system.

Fortunately now the model is written in Python getting it into Excel is extremely simple. PyXLL, the Python Excel Add-In has everything we need to write Python for Excel. All we need to do is add a few @xl_func decorators from the pyxll module and configure the PyXLL add-in to load the module containing our model.

If you’re not already familiar with PyXLL, check out the introduction to PyXLL from the user guide.

I mean, if the data’s going to live in Excel spreadsheets anyhow…

Comments closed

More Tabular Best Practices

Ginger Grant continues her series on Analysis Services Tabular best practices:

Optimize your DAX Code

While it is not easy to performance tune DAX you can do it, by evaluating the DAX Query Plan and VeritPaq Queries, and SQLBI’s VertiPaq Analyzer. Also, you can also look to use functions which perform better, for example COUNTROWS instead of DISTINCTCOUNT or ADDCOLUMNS instead of SUMMARIZE. Whenever possible use the CALCULATE function instead of the FILTER function, as CALCULATE filters for context inside the parenthesis and are more efficient. Also all of the iterative functions SUMX, COUNTX etc., should be used sparingly as the row-by-row transactions they create are less efficient and should be used only when SUM or COUNT will not work.  When evaluating if a value missing, if it is possible, use ISEMPTY instead of ISBLANK as ISEMPTY looks only for the presence of a row, which is faster than the evaluation performed by ISBLANK.

Read on for several more items in this vein.

Comments closed

What’s New With In-Memory OLTP In SQL Server 2019

Ned Otter gives us two things to look forward to with SQL Server 2019:

So far, there’s been only one publicly announced enhancement for In-Memory OLTP in SQL 2019: system tables in TempDB will be “Hekatonized”. This will forever solve the issue of system table contention in TempDB, which is a fantastic use of Hekaton. I’m told it will be “opt in”, so you can use this enhancement if you want to, but you can also back out of it, which would require a restart of the SQL Server service.

But there’s at least one other enhancement that’s not been announced, although the details of its implementation are not yet known.

Read on to learn about this not-yet-announced update.

Comments closed

Deep Dive On Index Spools

Hugo Kornelis takes a look at index spools:

The Index Spool operator is one of the four spool operators that SQL Server supports. It retains a copy of all data it reads in an indexed worktable (in tempdb), and can then later return subsets of these rows without having to call its child operators to produce them again.

The Index Spool operator is quite similar to Table Spool, except that Index Spool indexes its data, giving it the option to return a subset, and Index Spool lacks the option to read data from a spool created by another operator. The other two spool operators are quite different: Row Count Spool is optimized for specific cases where the rows to be returned are empty, and Window Spool is used to support the ROWS and RANGE specifications of windowing functions.

Eager and Lazy Spool operators rank high on my list of “troublesome when I see them” operators.  The reason is not so much that eager or lazy spools are inherently bad—they’re not, as they are efficient ways to perform a particular query given the constraints of that query—but if I see one of them in conjunction with a slowly-performing query, it’s a good sign that I want to optimize away the need for spooling.

Comments closed

Optimizing M Function Calls With Function.ScaleVector()

Chris Webb shows us how we can batch calls to M-driven web services:

One of the most common issues faced when calling web services in M is that the the easiest way of doing so – creating a function that calls the web service, then calling the function once per row in a table using the Invoke Custom Function button – is very inefficient. It’s a much better idea to batch up calls to a web service, if the web service supports this, but doing this forces you to write more complex M code. It’s a problem I wrestled with last year in my custom connector for the Cognitive Services API, and in that case I opted to create functions that can be passed lists instead (see here for more information on how functions with parameters of type list work); I’m sure the developers working on the new AI features in dataflows had to deal with the same problem. This problem is not limited to web services either: calculations such as running totals also need to be optimised in the same way if they are to perform well in M. The good news is that there is a new M function Function.ScalarVector() that allows you to create functions that combine the ease-of-use of row-by-row requests with the efficiency of batch requests.

As Chris notes in the post and in the comments, this is mostly useful when you can batch together individual calls to improve performance.  For functions which operate serially (like opening Excel workbooks), you won’t see much (if any) gain.

Comments closed

Monitoring When Databases Go Offline

Jason Brimhall shows how you can create an extended event to track whenever databases go offline:

The other day, I shared an article showing how to audit database offline events via the default trace. Today, I will show an easier method to both audit and monitor for offline events. What is the difference between audit and monitor? It largely depends on your implementation, but I generally consider an audit as something you do after the fact. Monitor is a little more proactive.

Hopefully, a database being taken offline is a known event and not a surprise. Occasionally there are gremlins, in the form of users with too many permissions, that tend to do very strange things to databases and database servers.

Click through for the script.

Comments closed