Press "Enter" to skip to content

Curated SQL Posts

Calculating Weekdays Between Dates

Daniel Hutmacher shows how to calculate the number of weekdays between two dates:

It has to be said, if performance is important (and it should be), you’ll know that user-defined scalar T-SQL functions can kill your query. So you might want to implement the above in the form of a CROSS APPLY instead

Here’s where I advocate for two tables on every instance:  a tally table and a date table.  The date table should look like a date dimension, including every potentially-interesting piece of information about a date—including if it’s a weekday.  It might not perform quite as fast as Daniel’s solution (I’d have to test to know for sure), but it’s definitely easier.   If you can’t get a viable date table on your instances for whatever reason, Daniel’s solution does work and does not require any additional objects.

Comments closed

Removing Snippet Line Numbers

Jens Vestergaard grabs some code and is left with line numbers he wants to remove:

Only too often I find myself searching the web for pieces of code, that I need for some odd solution. And almost every time, I come across a web page, where the code sample is displayed with line numbers. It’s always annoying to have to sit and delete those manually. No longer I say! Yes, you heard me right. Actually Visual Studio has a neat trick to help you get this done in a jiff.

This is useful for a lot more than just removing line numbers.

3 Comments

Minimum Memory Grant Hint

James Anderson looks at memory grant query hints:

This is one of a pair of query hints that can manage the memory grants of individual queries. The maximum memory grant hint allows you to limit the amount of memory an individual query can use. If the hint specifies a value lower than the amount of memory required to execute the query, the query will request the required amount. This is like a granular resource governor. For the rest of this post I will concentrate on the minimum memory grant query hint.

When the query optimiser builds an execution plan for a query, it calculates how much memory is required for the query to execute. This calculation is based on the estimated number of rows (used in sort or hash joins) and the average size of these rows. The optimiser calculates the least amount of memory required to perform the operation (required memory) and the amount of memory required to perform the whole operation in memory (desired memory). Each time the query is executed it requests a memory grant from the server. The query will perform much better if it’s granted its desired memory or more. If at run time the query has to perform sorts or hash operations on more than the estimated number of rows, it will spill to tempdb. Spilling to tempdb can be orders of magnitude worse for performance as your data has to be written to disk causing more IO.

This feels like the type of thing you might want to do once for a particularly recalcitrant query, but not something you want to do regularly.

Comments closed

SQL Injection Is Still Around

K Brian Kelley notes that SQL injection is still a viable security risk:

Keep in mind that from an architecture perspective, the primary place to stop SQL injection attacks is by validating the input when it comes in. If the input doesn’t match appropriate patterns, especially in the case of a banking application where the likely patterns for each input should be easily defined, you reject it at that level. It then doesn’t get appended or inserted into a text string which becomes the SQL statement to be executed against a database server.

If you don’t get it at this level, the ability to prevent the SQL injection attack gets much harder. Perhaps IDS/IPS can detect based on some text matches. We might be able to do the same thing within the database, say by using DML triggers. However, if the appended text generates queries that are basically what normally gets sent back, none of the back-end solutions are going to be very effective.

Pick your language and parameterize your queries.

Comments closed

Dealing With DocumentDB

Rolf Tesmer has an example of using DocumentDB:

How do you work out what DocumentDB will cost?  

Well, 1 RU corresponds to a GET of a 1KB document.  You then need to know the specifics of your workload such as number of documents, avg document size, how the documents are accessed (query patterns), and how often documents are updated.

I suggest loading your data and running your standard queries in the Azure Portal (see below) to see what RU’s your workload generates, and use that in your calculations.

Rolf also has some comparisons to SQL Server queries and some hints with integrating SQL Server with DocumentDB.

Comments closed

Spark 2.0 Technical Preview

Reynold Xin gives a preview of Apache Spark 2.0:

One thing we are proud of in Spark is creating APIs that are simple, intuitive, and expressive. Spark 2.0 continues this tradition, with focus on two areas: (1) standard SQL support and (2) unifying DataFrame/Dataset API.

On the SQL side, we have significantly expanded the SQL capabilities of Spark, with the introduction of a new ANSI SQL parser and support for subqueries. Spark 2.0 can run all the 99 TPC-DS queries, which require many of the SQL:2003 features. Because SQL has been one of the primary interfaces Spark applications use, this extended SQL capabilities drastically reduce the porting effort of legacy applications over to Spark.

There’s some great stuff coming out of DataBricks.  Spark 2.0 looks to be an exciting product.

Comments closed

Comparing Table Contents

Kenneth Fisher discusses methods for comparing table contents:

Roll your own
Ie write one T-SQL query to compare the structure of the two tables (this one will always be the same) and another to compare the data (new each time).

  • Benefits

    • Free.
    • You get practice in writing T-SQL (yes, I’m stretching there.)
  • Flaws

    • Can take quite a bit of time to write. (Don’t forget to check for NULLs separately.)

    • Mistakes happen.

    • Performance may not be spectacular.

Good comparison.  Check it out.

Comments closed

Data Types

Grant Fritchey implores you to use the correct data type:

Far too often then, the easy answer, just change the column to varchar. You can trim the time and output in exactly the format needed by the business. Problem solved and it was easy…

Well, until someone inputs “Janry 20, 2009” slightly mangling the spelling and suddenly your report looks all messed up. Or, they ask you to start filtering just the last two weeks, regardless of when the report was run and you now can’t easily dodate math on the column. Even after you get over that problem with a little formatting using CAST (along with ISDATE to try to catch all those other typos that are in the system now)  you notice that the performance is really slow so you go to put an index on the column and now you have an index key that is 50 bytes wide instead of the 3 bytes that the DATE data type would have been, making the index less efficient (not to mention, sorting the data is going to put February ahead of January, more formatting).

Read the whole thing.

Comments closed

New Ambari Version

Paul Codding announces Ambari 2.2.2:

Grafana is deployed, managed and pre-configured to work with the Ambari Metrics service. We are including a curated set dashboards for core HDP components, giving operators at-a-glance views of the same metrics Hortonworks Support & Engineering review when helping customers troubleshoot complex issues.

Metrics displayed on each dashboard can be filtered by time, component, and contextual information (YARN queues for example) to provide greater flexibility, granularity and context.

Ambari is really shaping up to be a nice framework for managing a Hadoop cluster.  I’m excited to see improved monitoring capabilities.

Comments closed