Press "Enter" to skip to content

Day: August 17, 2017

Building An Image Recognizer With R

David Smith has a post showing how to build an image recognizer with R and Microsoft’s Cognitive Services Library:

The process of training an image recognition system requires LOTS of images — millions and millions of them. The process involves feeding those images into a deep neural network, and during that process the network generates “features” from the image. These features might be versions of the image including just the outlines, or maybe the image with only the green parts. You could further boil those features down into a single number, say the length of the outline or the percentage of the image that is green. With enough of these “features”, you could use them in a traditional machine learning model to classify the images, or perform other recognition tasks.

But if you don’t have millions of images, it’s still possible to generate these features from a model that has already been trained on millions of images. ResNet is a very deep neural network model trained for the task of image recognition which has been used to win major computer-vision competitions. With the rxFeaturize function in Microsoft R Client and Microsoft R Server, you can generate 4096 features from this model on any image you provide. The features themselves are meaningful only to a computer, but that vector of 4096 numbers between zero and one is (ideally) a distillation of the unique characteristics of that image as a human would recognize it. You can then use that features vector to create your own image-recognition system without the burden of training your own neural network on a large corpus of images.

Read the whole thing and follow David’s link to the Microsoft Cognitive blog for more details.

Comments closed

MERGE In Hive

Carter Shanklin introduces the MERGE operator in Hive:

USE CASE 2: UPDATE HIVE PARTITIONS.

A common strategy in Hive is to partition data by date. This simplifies data loads and improves performance. Regardless of your partitioning strategy you will occasionally have data in the wrong partition. For example, suppose customer data is supplied by a 3rd-party and includes a customer signup date. If the provider had a software bug and needed to change customer signup dates, suddenly records are in the wrong partition and need to be cleaned up.

It has been interesting to see Hive morph over the past few years from a batch warehousing system to something approaching a relational warehouse.  This is one additional step in that direction.

Comments closed

Performance Problems Due To Readable Secondaries

Paul Randal describes a problem when you create a readable secondary on an Availability Group:

Yesterday I blogged about log shipping performance issues and mentioned a performance problem that can be caused by using availability group readable secondaries, and then realized I hadn’t blogged about the problem, only described it in our Insider newsletter. So here’s a post about it!

Availability groups (AGs) are pretty cool, and one of the most useful features of them is the ability to read directly from one of the secondary replicas. Before, with database mirroring, the only way to access the mirror database was through the creation of a database snapshot, which only gave a single, static view of the data. Readable secondaries are constantly updated from the primary so are far more versatile as a reporting or non-production querying platform.

But I bet you didn’t know that using this feature can cause performance problems on your primary replica?

Definitely read the whole thing.

Comments closed

Dealing With The Registry From SQL Server

Wayne Sheffield shows how to read and modify registry entries using SQL Server:

xp_instance_regread

In this example, I used xp_regread to read the direct registry path. If you remember from earlier, there are SQL Server instance-aware versions of each registry procedure. A comparable statement using the instance-aware procedure would be:

This statement returns the exact same information. Let’s look at the difference between these – in the first query, the registry path is the exact registry path needed, and it includes “\Microsoft SQL Server\MSSQL12.SQL2014\”. In the latter query, this string is replaced with “\MSSQLSERVER\”. Since the latter function is instance aware, it replaces the “MSSQLSERVER” with the exact registry path necessary for this instance of SQL Server. Pretty neat, isn’t it? This allows you to have a script that will run properly regardless of the instance that it is being run on. The rest of the examples in this post will utilize the instance-aware procedures to make it easier for you to follow along and run these yourself.

Sometimes you just have to change something in the registry from SQL Server.  Hopefully that “sometimes” is rare.

Comments closed

Attaching Databases To Docker

Andrew Pruski shows one scenario where Docker on Windows is better than Docker on Linux:

One of the (if not the) main benefits of working with SQL in a container is that you can create a custom image to build container from that has all of your development databases available as soon as the container comes online.

This is really simple to do with Windows containers. Say I want to attach DatabaseA that has one data file (DatabaseA.mdf) and a log file (DatabaseA_log.ldf): –

ENV attach_dbs="[{'dbName':'DatabaseA','dbFiles':['C:\\SQLServer\\DatabaseA.mdf','C:\\SQLServer\\DatabaseA_log.ldf']}]"

Nice and simple! One line of code and any containers spun up from the image this dockerfile creates will have DatabaseA ready to go.

However this functionality is not available when working with Linux containers. Currently you cannot use an environment variable to attach a database to a SQL instance running in a Linux container.

Read on to see what you can do if you’re using a Linux container.

Comments closed

NULL Replacement In SQL Server And Oracle

Daniel Janik shows a pair of non-standard functions you can use to replace NULL values:

It’s Wednesday and that means another SQL/Oracle post. Today we’ll be discussing NULL Values, which can sometimes be a real pain. Don’t worry though there’s a simple solution. Simply replace the NULL value with another.

Comparing a column with NULL and replacing with another value is really simple. There are built in functions for replacing NULL values. I’m not going to discuss the ANSI standard COALESCE here. If you want to know more about it you can find it on Bing.

I provide no comment on Daniel’s claim regarding being able to find something on Bing…  Click through to see the custom NULL replacement functions in SQL Server versus Oracle.

Comments closed

Biml Enrichment With Annotations

Bill Fellows shows why it’s useful to include annotations in your Biml scripts:

In many of the walkthroughs on creating relational objects via Biml, it seems like people skim over the Databases collection. There’s nothing built into the language to really support the creation of database nodes. The import database operations are focused on tables and schemas and assume the database node(s) have been created. I hate assumptions.

Read on for more about dealing with databases, and not just tables and other database objects, in Biml.

Comments closed

Dealing With Limited Rights In Biml

Shannon Lowder walks through a scenario where he wants limited rights to process metadata changes, separate from any data transfer:

My development environment has a local instance of SQL Server with AdventureWorks2014 on it.  I’m going to use that as my source.  I also created a database on this instance called BimlExtract to serve as my destination database.

To create a user that can only read the schema on the source system, I created a login and user named ‘Biml’.  I granted this user VIEW DEFINITION in AdventureWorks2014. I also added this user to the db_owner group in BimlExtract.  Now, this user can read the schema of the source, and create tables in the destination. I’ve included the T-SQL to set the permissions in Database Setup.sql.

Now, we’re ready to walk through the solution.

Click through for the solution and also a GitHub repo with all of Shannon’s code.

Comments closed