Press "Enter" to skip to content

Month: February 2017

Metaphones In SQL

Phil Factor builds a function to generate metaphones in SQL Server:

Metaphone algorithms are designed to produce an approximate phonetic representation, in ASCII, of regular “dictionary” words and names in English and some Latin-based languages. It is intended for indexing words by their English pronunciation. It is one of the more popular of the phonetic algorithms and was published by Lawrence Philips in 1990. A Metaphone is up to ten characters in length.

It is used for fuzzy searches for records where each string to be searched has an index with a Metaphone key. You search for all records with the same or similar metaphone key and then refine the search by some ranking algorithm such as Damerau–Levenshtein distance. Metaphone searches are particularly popular with ‘ancestor’ sites that search on surnames where spellings vary considerably for the same surname. The current version, Metaphone 3, is actively maintained by Lawrence Philips, developed to account for all spelling variations commonly found in English words, first and last names found in the United States and Europe, and non-English words whose native pronunciations are familiar to English-speakers. The source of Metaphone 3 is proprietary, and Lawrence charges a fee to supply the source.

Read on for the script.

Comments closed

Using ostress

Erik Darling explains how to stress test using ostress:

You can spend a lot of money on people and complicated software to design, run, and monitor workloads against test environments, or you can throw together tests with some free tools like SQL Query Stress or Microsoft’s RML Utilities.

RML Utilities is pretty cool, and includes ostress.exe along with some other components. What I really like about ostress is that it’s all CLI, so you can call it from Agent Jobs, and feed it all sorts of cool options. When you create the Agent Job, you just use the Operating system (CmdExec) type of step instead of T-SQL, or whatever else.

For simulated load, ostress is powerful.  For “ripped from the headlines” load, the Database Experimentation Assistant might do the trick once it matures.

Comments closed

Generating SQL Permutations

Michael J. Swart uses a recursive common table expression and bit shifting to build a full set of permutations:

If you google “generating permutations using SQL”, you get thousands of hits. It’s an interesting problem if not very useful.
I wrote a solution recently and thought I’d share it. If you’re keen, try tackling it yourself before moving on.

Click through for the script.  It’s an interesting approach and might be worth playing a round of code golf.

Comments closed

Managing Azure SQL Database Firewall Rules

Cedric Charlier shows how to manage Azure SQL Database firewall rules from within Management Studio:

When you create a new Azure database, you usually need to open the firewall to remotely administrate or query this database with SSMS. An option is to create rules from the Azure Portal. It’s surely a convenient way to do it when you create a database but I prefer to keep a minimum of tools and when the Azure portal is not open, I prefer to not have to open it just to define a few firewall rules.

Opening the firewall with SSMS is a kind of chicken and eggs problem: to connect to your database/server, you need to open the firewall. Hopefully, SSMS has a great suite of screens to call the underlying API of Azure Portal and open the firewall for the computer running SSMS.

Cedric shows off sp_delete_firewall_rule but there’s also a corresponding sp_set_firewall_rule.

Comments closed

Maintenance Plan Updates

Kevin Hill looks at maintenance plan updates in SQL Server 2016:

True to my typical post style which focuses on small shops, accidental DBAs, and junior DBAs I went looking for something that could very easily benefit people that are using the basic SQL Server features.  In this case they may not even realize how limited they were.

I chose to write about Index Maintenance in the built-in Maintenance Plan portion of SQL Server.

A brief summary of the built-in Maintenance Plans is that they allow you to drag-and-drop your way to basic SQL Server maintenance items such as Backups, Index maintenance, CheckDB, Statistics updating, etc.  This a tool that has been around since at least version 7 that I know of.  It wasn’t always very good, and it gets a bad rap from a lot of DBAs.  It has been dramatically improved over the years in flexibility and reliability.

Read on for the changes.  I’m really not a fan of maintenance plans, but if they’re going to exist, they should at least be as good as possible.

Comments closed

Getting A Grip On Columnstore

Melissa Coates has a nice introduction to columnstore indexes:

Many of the in-memory features in the Microsoft platform rely on the xVelocity (formerly known as VertiPaq) engine. The implementations do differ somewhat between products, such as the requirements for data to be truly memory-resident.

The remainder of this post will focus on the two columnstore technologies in SQL Server: clustered and nonclustered.

If you’re not very familiar with columnstore and aren’t quite ready to tackle Niko Neugebauer’s columnstore series, this is a good way to get started.

Comments closed

Altering Job Steps With Powershell

Rob Sewell uses Powershell to modify hundreds SQL Agent job steps on hundreds of SQL Server instances:

We will use the sqlserver module, so you will need to have installed the latest version of SSMS from https://sqlps.io/dl

This code was run using PowerShell version 5 and will not work on Powershell version 3 or lower as it uses the where method.

Lets grab all of our jobs on the estate. (You will need to fill the $Servers variable with the names of your instances, maybe from a database or CMS or a text file)

One oddity with SQL Agent jobs is that you absolutely need to call the Alter() method at the end or else the changes will not actually take effect.

Comments closed

Bulk Loading HDInsight Using Phoenix

Anunay Tiwari uses Phoenix to bulk load data into HBase on HDInsight:

Apache HBase is an open Source No SQL Hadoop database, a distributed, scalable, big data store. It provides real-time read/write access to large datasets. HDInsight HBase is offered as a managed cluster that is integrated into the Azure environment. HBase provides many features as a big data store. But in order to use HBase, the customers have to first load their data into HBase.

There are multiple ways to get data into HBase such as – using client API’s, Map Reduce job with TableOutputFormat or inputting the data manually on HBase shell. Many customers are interested in using Apache Phoenix – a SQL layer over HBase for its ease of use. The current post describes about how to use phoenix bulk load with HDinsight clusters.

Phoenix provides two methods for loading CSV data into Phoenix tables – a single-threaded client loading tool via the psql command, and a MapReduce-based bulk load tool.

Anunay explains both methods, allowing you to choose based on your data needs.

Comments closed

Performance Testing Hadoop File Formats

Zbigniew Baranowski looks at the performance of several Hadoop file formats for various activities:

The data access and ingestion tests were on a cluster composed of 14 physical machines, each equipped with:

  • 2 x 8 cores @2.60GHz
  • 64GB of RAM
  • 2 x 24 SAS drives

Hadoop cluster was installed from Cloudera Data Hub(CDH) distribution version 5.7.0, this includes:

  • Hadoop core 2.6.0
  • Impala 2.5.0
  • Hive 1.1.0
  • HBase 1.2.0 (configured JVM heap size for region servers = 30GB)
  • (not from CDH) Kudu 1.0 (configured memory limit = 30GB)

Apache Impala (incubating) was used as a data ingestion and data access framework in all the conducted tests presented later in this report.

I would have liked to have seen ORC included as a file format for testing.  Regardless, I think this article shows that there are several file formats for a reason, and you should choose your file format based on most likely expected use.  For example, Avro or Parquet for “write-only” systems or Kudu for larger-scale analytics.

Comments closed