Press "Enter" to skip to content

Curated SQL Posts

Image Recognition Using Viola-Jones

Ellen Talbot lays out some of the basics of image recognition:

Aggregate channel features (ACF) is a variation of channel features, which extracts features directly as pixel values in extended channels without computing rectangular sums at various locations and scales.

Common channels include the colour channels, such as grey-scale and RBG, but many other channels can be encoded, depending on the difficulty of your problem (e.g. gradient magnitude and gradient histograms).

ACF has advantages, such as a richer representation, accelerated detection speed and more accurate localisation of objects in the images when used in conjunction with a boosting method.

Click through for more, including a few resources around the Viola-Jones algorithm.

Comments closed

Configuring SQL Operations Studio

Ahmad Yaseen demonstrates how to configure SQL Operations Studio as well as writing queries with it:

To customize your connection, click the Advanced button that provides a large number of options that can help you to draw a specific type of connection. For example, you can specify the application workload type when connecting to the server by setting the Application Intent option. You can also override the default Connect Timeout setting, the SQL Server Current Language, the default Column Encryption Setting for all commands on the connection, the Encrypt option to use the SSL encryption for all data sent between the client and the server if there is an installed certificate, Persist Security Info to prevent returning the password as a part of the connection, and use the SSL encryption although there is no certificate in the server by enabling the Trust Server Certificate.

You can also use the Advanced options to specify the number of attempts to restore connection and the delay between attempts using Connect Retry Count and Connect Retry Interval. In addition, you will be able also to specify the maximum and the minimum number of connections allowed in the pool with the ability to force that the connection object is drawn from the appropriate pool, and the minimum amount of time for that connection to live in the pool using Load Balance Timeout. The Failover Partner option allows you to provide the name of the SQL Server instance that acts as a failover partner. You can control the size of the network packets used to communicate with the SQL Server instance using the Packet Size option.

It’s interesting to see just how much you can configure in the tool.

Comments closed

Full-Screen SSMS

Wayne Sheffield has another SSMS tip for us:

Do you ever find yourself working on a query and realize that you need just a bit more real estate in the SSMS window? Or perhaps you find that all the toolbars, menus, etc. are cluttering things up? To solve these issues, you can toggle the full screen mode in SSMS on. It will remove all that clutter and maximize the query window. Below, you can see a cluttered SSMS with two rows of buttons, and toolbars on both sides of it.

Click through to see how to enable full-screen mode.

Comments closed

Query Store And Multiple Plans Per Query

Kendra Little follows Betteridge’s Law:

Can I Force Multiple Plans for a Query in Query Store?

Nope.

At least, not right now.

I started thinking about this when I noticed that the sys.sp_query_store_unforce_plan requires you to specify both a @query_id and a @plan_id.

If there’s only ever one plan that can be forced for a query, why would I need to specify the @plan_id?

I’ve got no insider knowledge on this, I just started thinking about it.

Read on for Kendra’s thoughts.  Maybe we will get something like multiple plans for a single query in the future, though figuring out which forced plan would relate to which combination of parameters would get complex pretty fast.

Comments closed

Calculating The End Of The Month

Bob Pusateri gives us a few techniques for calculating the last day of a particular month:

Months are funny. Unlike other parts of a date, they vary in length:

  • The last second of a minute is always 59.
  • The last minute of a hour is always 59.
  • The last hour of a day is always 23.

But the last day of a month? Well that depends on what month it is. And the year matters too because a leap year means February gets an extra day.

Click through for several techniques, including the knuckle technique for advanced practitioners.  But what if I need to calculate the end of a lunar month?

Comments closed

Migrating Database Files

Jeff Mlakar gives us three methods for migrating database files from one location to another:

The database will be unavailable during this operation so we need to notify our end users. Consider the ramifications if an application is using the database – we might want to stop application services or take some other custom action during the move.

Plan ahead before starting the job. Know what you are going to do before doing it. If you can test your method against a lab or development database that will help too.

Sound advice and technique.  Click through to see those three methods.

Comments closed

Switching Partitions And Table Structure

Andrew Pruski demonstrates a gotcha when switching partitions between tables:

When working with partitioning the SWITCH operation has to be my favourite. The ability to move a large amount of data from one table to another as a META DATA ONLY operation is absolutely fantastic.

What’s also cool is that we can switch data into a non-partitioned table. Makes life a bit easier not having to manage two sets of partitions!

However, there is a bit of a gotcha when doing this. Let’s run through a quick demo.

Read on for more.

Comments closed

Microsoft ML Server 9.3 Released

Nagesh Pabbisetty announces Microsoft Machine Learning Server 9.3:

In ML Server 9.3, we have added support for SQL compute context in ML Server and in R Client running on Linux platforms, so data scientists who work on Linux workstations can directly use in-database analytics with SQL Server compute context. Additionally, the SQLRUtils package can now be used to package the R scripts into T-SQL stored procedures and run them from R environment on Linux clients.

An interesting scenario enabled by the addition of SQL Server Compute context in ML Server running on Linux is that organizations can now provide a browser-based interface for accessing SQL Server compute context with R Studio Server and ML Server running on a Linux machine connecting to SQL Server.

Since introducing revoscalepy library in the last release of ML Server and SQL Server 2017, we have shipped several additions and improvements in the Python APIs as part of CU releases of SQL Server 2017. We have added APIs like rx_create_col_info, rx_get_var_info etc. that make it easier to get column information, esp. with large number of columns. We added rx_serialize_model for easy model serialization. We have also improved performance when working with string data in different scenarios.

This also gets you up to R 3.4.3. H/T David Smith

Comments closed

Looping In Python And R

Dmitry Kisler has a quick comparison of looping speed in Python and R:

This post is about R versus Python in terms of the time they require to loop and generate pseudo-random numbers. To accomplish the task, the following steps were performed in Python and R (1) loop 100k times (ii is the loop index) (2) generate a random integer number out of the array of integers from 1 to the current loop index ii (ii+1 for Python) (3) output elapsed time at the probe loop steps: ii (ii+1 for Python) in [10, 100, 1000, 5000, 10000, 25000, 50000, 75000, 100000]

The findings were mostly unsurprising to me, though there was one unexpected twist.

Comments closed

Starting A Data Science Project: Business Understanding

I continue my data science project series:

As you listen to these types of questions, your goal is to nail down a specific problem with a specific answer.  You want to narrow down the scope to something that your team can achieve, ideally something with a built-in measure for success.  For example, here are a few specific problems that we could go solve:

  • Find a model which predicts quarterly sales to within 5% no later than 30 days into the quarter.
  • Given a title and description for a product, tell me a listing category which Amazon will, with at least 90% confidence, consider valid for this product.
  • Determine the top three factors which most affect the number of years the first owner holds onto our mid-range sedan.

With a specific problem in mind, you can look for relevant data.  Of course, you’ll probably need to modify the scope of this problem over time as you gather new information, but this gives you a starting point for success.  Also, don’t expect something as clear-cut as the above early on; instead, people will hem and haw, not quite sure what they really want.  You can take a fuzzy goal into data acquisition, but as you acquire data, you will want to work with the champion to focus down to a targeted and valuable problem.

Read on for several references to big sacks of cash.  After becoming a manager, I’ve become much more attuned to the idea of receiving big sacks of cash.

Comments closed