Press "Enter" to skip to content

Month: November 2017

Finding Queries In Need Of Indexing

Jeff Schwartz continues his series on index tuning:

Table 1 shows examples of queries that potentially need tuning based upon the number of executions, total reads, total duration, total CPU time, and average reads per execution. This kind of report immediately focuses attention on the queries that might benefit the most from either index or query tuning. The five queries highlighted in Table 1 underscore these criteria. The ones highlighted in yellow were the worst offenders because their executions collectively performed the most reads with the worst one totaling 3.5 BILLION reads. The ones highlighted in light green and orange accounted for the most CPU time as well as the longest total duration. The one highlighted in slate ran the most times, and the ones highlighted in gray performed the most reads per execution. This information is vital when determining where query and index tuning should be focused.

Jeff walks through some of his data collection and analysis process in this post, making it worth a read.

Comments closed

Security Issue In Oracle Identity Manager

Oracle has a security advisory with a CVSS base score of 10.0 (which is pretty awful):

This Security Alert addresses CVE-2017-10151, a vulnerability affecting Oracle Identity Manager. This vulnerability has a CVSS v3 base score of 10.0, and can result in complete compromise of Oracle Identity Manager via an unauthenticated network attack. The Patch Availability Document referenced below provides a full workaround for this vulnerability, and will be updated when patches in addition to the workaround are available.

Due to the severity of this vulnerability, Oracle strongly recommends that customers apply the updates provided by this Security Alert without delay.

Catalin Cimpanu explains:

The affected product is Oracle Identity Manager (OIM), a user management solution that allows enterprises to control what parts of their network employees can access. OIM is part of Oracle’s highly popular Fusion Middleware offering and is one of its most used components.

Oracle describes the issue — tracked under the CVE-2017-10151 identifier — as a “default account” vulnerability, an umbrella term that’s usually used to describe accounts with no password or hardcoded credentials (a.k.a. backdoor accounts).

“This vulnerability is remotely exploitable without authentication, i.e., may be exploited over a network without requiring user credentials,” Oracle said in a security alert.

Oracle has patched this.  If you have it installed, please update ASAP.

Comments closed

Azure SQL DB Automatic Tuning FAQ

Arun Sirpal has a self-Q&A session regarding Azure SQL Database’s automatic tuning options:

What are the options?

  1. CREATE INDEX that identifies the indexes that may improve performance of your workload, creates the indexes, and verifies that they improve performance of the queries.

  2. DROP INDEX that identifies redundant and duplicate indexes, and indexes that were not used in the long period of time.

  3. PLAN REGRESSION CORRECTION that identifies SQL queries that are using execution plan that are slower than previous good plan, and uses the last known good plan instead of the regressed plan.

Very useful information.

Comments closed

Hortonworks Data Platform 2.6.3 Released

Roni Fontaine, et al, announce Hortonworks Data Platform 2.6.3:

First, in this release, we made huge strides with regards to Data Science.  This includes:

  • Apache Spark 2.2 is GA. This enhances the stability of Spark for structured streaming.

  • Apache Zeppelin 0.7.3 is GA. This provides support for notebooks stored in HDFS, and improves the scalability and multi-user support.

  • Apache Livy 0.4.0 is GA. This version supports multiple programming languages in the same Spark context.

This is a point release, so most of it is around adding updated versions of different products.

Comments closed

Thinking About Slowly Degrading Page Performance

Ritesh Maheshwari talks about how LinkedIn deals with performance regressions:

Looking at the chart above, where the dotted red line is a reference point to show where we started the year, notice how site speed improvements tend to be significant and noticeable, as they are optimization-driven. Degradations, however, can generally be of any “amount,” as they happen for various reasons. LinkedIn’s page-serving pipeline has many moving parts. We deploy code multiple times per day, operate a micro-service architecture with hundreds of services, and infrastructure upgrades are frequent. A slowdown in any of these components can cause degradations.

While large degradations can be caught using A/B testingcanary analysis, or anomaly detection, small ones tend to leak to production. Thus, performance of a page has a tendency to always degrade over time.

This led to having the centralized Performance Team focus on identifying these leaks, called “site speed regressions,” and to craft tools and processes to fix them.

It’s an interesting principle.  I could see this principle work for tracking database performance degradation as well.

Comments closed

Promises And Closures In R

Damian Rodziewicz looks at the new promises package in R:

Citing Joe Cheng, our aim is to:

  1. Execute long-running code asynchronously on separate thread.
  2. Be able to do something with the result (if success) or error (if failure), when the task completes, back on the main R thread.

A promise object represents the eventual result of an async task. A promise is an R6 object that knows:

  1. Whether the task is running, succeeded, or failed

  2. The result (if succeeded) or error (if failed)

This looks pretty exciting.  H/T R-Bloggers

Also, Sebastian Warnholz has a post on promises and closures in case you’re not familiar with the concepts:

Every argument you pass to a function is a promise until the moment R evaluates it. Consider a function g with arguments x and y. Let’s leave out one argument in the function call:

g <- function(x, y) x
g(1)

## [1] 1

R will be forgiving (lazy) until the argument y is actually needed. Until then y exists in the environment of the function call as a ‘name without a value’. Only when R needs to evaluate y a value is searched for. This means that we can pass some non-existent objects as arguments to the function g and R won’t care until the argument is needed in the functions body.

Read the whole thing.  Once again, H/T R-Bloggers

Comments closed

Team Data Science Process Updates

David Smith announces updates to the Team Data Science Process:

It’s been over a year since we first introduced introduced the Team Data Science Process (TDSP). The data, technology and practices behind Data Science continue to evolve, and the TDSP has evolved in parallel. Over the past year, several new facets have been added, including:

  • The IDEAR (Interactive Data Exploration, Analysis and Reporting) framework, an open source extension to R and Python designed to standardize the process of data exploration and reporting;

  • Guidance for use of Spark 2.0, including an end-to-end Spark v2.0 walkthrough;

  • Guidance for use of in-database Python with SQL Server, including an end-to-end in-database Python tutorial;

Click through for more changes, as well as links to further resources.

Comments closed

Python For The DBA: Copying SQL Logins

David Fowler has an example of how DBAs can use Python to do something interesting:

There are plenty of times when you might want to copy your SQL logins (including the SID) from one server to another.  Perhaps you’re running an AG and need to make sure that all users exist on all your secondaries with the correct SID, perhaps you’re migrating servers and need all the logins on your new server.  Whatever the reason, there are a number of different ways in which you can do this but they usually require scripting out on one server and then running the script into another server, or of course there’s Powershell.

The below script will use Python to copy all or specified logins from one server to another, including the password and SID.

Click through for the script.  The main use case for a SQL Server DBA to learn Python as a DBA scripting language would be if you run SQL on Linux—I don’t think Powershell on Linux is far enough developed to handle the full range of DBA tasks.  Otherwise, I’d use Powershell and dbatools.

Comments closed

Stack Shuffle Custom Visual

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the Enlighten Stack Shuffle Custom Visual.  The Enlighten Stack Shuffle is helpful when you want to display a Top N set of values.  For example if you want to display your top 5 selling employees this visual can make that very easy.

This looks pretty good on a dashboard, especially if you have a top-heavy data set, where the top few items are by far the most important.

Comments closed

Sentiment Analysis With Python In SQL Server

Nellie Gustafsson has a quick example of sentiment analysis using SQL Server Machine Learning Services:

You don’t have to be a data scientist to use machine learning in SQL Server. You can use pre-trained models available for usage out of the box to do your analysis. The following example shows you how you quickly get started and do text sentiment analysis.

Before starting to use this model, you need to install it. The installation is quick and instructions for installing the model can be found here: How to install the models on SQL Server

Once you have SQL Server installed with Machine Learning Services, enabled external script execution, and installed the pre-trained model, you can execute the following  script to create a stored procedure that uses Python and the microsoftml function get_sentiment with the pre-trained model to determine the probability of positive sentiment of a text:

Click through to read the whole thing.

Comments closed