Press "Enter" to skip to content

Day: October 16, 2019

Training, Validation, and Test Data Sets with SAS Viya

Beth Ebersole takes us through creating training, validation, and test data sets using SAS Viya:

Training data are used to fit each model. Training a model involves using an algorithm to determine model parameters (e.g., weights) or other logic to map inputs (independent variables) to a target (dependent variable). Model fitting can also include input variable (feature) selection. Models are trained by minimizing an error function.

For illustration purposes, let’s say we have a very simple ordinary least squares regression model with one input (independent variable, x) and one output (dependent variable, y). Perhaps our input variable is how many hours of training a dog or cat has received, and the output variable is the combined total of how many fingers or limbs we will lose in a single encounter with the animal.

Read on for some good notes, including the difference between mean squared error and average squared error.

Comments closed

Shaded Ranges in Excel

Elizabeth Ricks shows how to create shaded ranges in Excel:

We can see there’s clear seasonality in this business—overall volume is highest in the summer and each outing type generally follows the same monthly pattern. Let’s say you manage the Family rentals and you’d like to compare your monthly volume to what you’re seeing across the entire fleet. 

For the purpose of this tactical illustration, let’s assume the shape of the data—relative peaks and valleys—is more important than the specifics of each category individually. If that’s the case, I can simplify by showing a shaded region to depict the range of absolute passengers each month.

This technique is excellent when you have a large number of lines but only care about one versus the norm, and individual lines would be too distracting.

Comments closed

Running Oracle on Azure

Kellyn Pot’vin-Gorman takes us through various options on running Oracle in Azure:

Running Oracle on Azure VM environments aren’t that different from running Oracle on VMs in your on-premises for a DBA.  The DBAs and developers that I work with still have their jobs, still work with their favorite tools and also get the chance to learn new skills such as cloud administration.

Click through for more, including a setup script.

Comments closed

MSDTC and Availability Groups

Ryan Adams provides guidance on using distributed transactions against Availability Groups:

A paramount concept to understand is how to make the DTC highly available.  We can see from the precedence order that SQL Server will use the local DTC out of the box.  This makes it appear that everything is working, and it is, but it is not exactly highly available.

I see a lot of customers leave it configured this way because they either don’t know the ramifications or do not realize they are using the MSDTC (Linked Servers). Since it simply works out of the box, things get left this way until they end up with a suspect database and error messages that look like this:

“SQL Server detected a DTC/KTM in-doubt transaction with UOW  {598B7EDD-F7A1-9DC1-8D3E-303A4C93AAB4}.Please resolve it following the guideline for Troubleshooting DTC Transactions.”

Read the whole thing. There are a lot of small areas between processes where things can fail, and the combination of DTC + AGs is no different.

Comments closed

Comparing Azure SQL DB and RDS on Price and Performance

Joey D’Antoni summarizes a recent publication:

Yesterday, GigaOm published a benchmark of Azure SQL Database as compared to Amazon’s RDS service. It’s an interesting test case that tries to compare the performance of these platform as a service database offerings. One of the many challenges of this kind of a study is that the product offerings are not exactly analogous. However, GigaOm was able to build a test case where they used similar sized offerings as shown in the image below.

Read on for the summary and Joey’s thoughts.

Comments closed

Query Performance With and Without Stored Procedures

Bert Wagner takes on the idea that stored procedures are faster (or slower) than ad hoc SQL:

A few months ago I was presenting for a user group when someone asked the following question:

Does a query embedded in a stored procedure execute faster than that same query submitted to SQL Server as a stand alone statement?

The room was pretty evenly split on the answer: some thought the stored procedures will always perform faster while others thought it wouldn’t really matter.

Bert explains the answer. For me, performance isn’t even on the radar for how I try to convince people to use procedures. Instead, explain it to developers as an interface: developers program against a contract, where they can send in a specified set of inputs (procedure parameters), get back a specified output (the shape of the result set), and not have to care about the fine details. Programming to interfaces is extremely common in business development, and this is just one more interface.

Comments closed