Only too often I find myself searching the web for pieces of code, that I need for some odd solution. And almost every time, I come across a web page, where the code sample is displayed with line numbers. It’s always annoying to have to sit and delete those manually. No longer I say! Yes, you heard me right. Actually Visual Studio has a neat trick to help you get this done in a jiff.
This is useful for a lot more than just removing line numbers.
This is one of a pair of query hints that can manage the memory grants of individual queries. The maximum memory grant hint allows you to limit the amount of memory an individual query can use. If the hint specifies a value lower than the amount of memory required to execute the query, the query will request the required amount. This is like a granular resource governor. For the rest of this post I will concentrate on the minimum memory grant query hint.
When the query optimiser builds an execution plan for a query, it calculates how much memory is required for the query to execute. This calculation is based on the estimated number of rows (used in sort or hash joins) and the average size of these rows. The optimiser calculates the least amount of memory required to perform the operation (required memory) and the amount of memory required to perform the whole operation in memory (desired memory). Each time the query is executed it requests a memory grant from the server. The query will perform much better if it’s granted its desired memory or more. If at run time the query has to perform sorts or hash operations on more than the estimated number of rows, it will spill to tempdb. Spilling to tempdb can be orders of magnitude worse for performance as your data has to be written to disk causing more IO.
This feels like the type of thing you might want to do once for a particularly recalcitrant query, but not something you want to do regularly.
Keep in mind that from an architecture perspective, the primary place to stop SQL injection attacks is by validating the input when it comes in. If the input doesn’t match appropriate patterns, especially in the case of a banking application where the likely patterns for each input should be easily defined, you reject it at that level. It then doesn’t get appended or inserted into a text string which becomes the SQL statement to be executed against a database server.
If you don’t get it at this level, the ability to prevent the SQL injection attack gets much harder. Perhaps IDS/IPS can detect based on some text matches. We might be able to do the same thing within the database, say by using DML triggers. However, if the appended text generates queries that are basically what normally gets sent back, none of the back-end solutions are going to be very effective.
How do you work out what DocumentDB will cost?
Well, 1 RU corresponds to a GET of a 1KB document. You then need to know the specifics of your workload such as number of documents, avg document size, how the documents are accessed (query patterns), and how often documents are updated.
I suggest loading your data and running your standard queries in the Azure Portal (see below) to see what RU’s your workload generates, and use that in your calculations.
Rolf also has some comparisons to SQL Server queries and some hints with integrating SQL Server with DocumentDB.
Recently, I was thinking about nonclustered indexes in SQL Server, and how included columns are stored. Is SQL Server smart enough to optimize the storage for small indexes with includes? Find out in this free seven minute video.
It’s a short video, well worth your time.
One thing we are proud of in Spark is creating APIs that are simple, intuitive, and expressive. Spark 2.0 continues this tradition, with focus on two areas: (1) standard SQL support and (2) unifying DataFrame/Dataset API.
On the SQL side, we have significantly expanded the SQL capabilities of Spark, with the introduction of a new ANSI SQL parser and support for subqueries. Spark 2.0 can run all the 99 TPC-DS queries, which require many of the SQL:2003 features. Because SQL has been one of the primary interfaces Spark applications use, this extended SQL capabilities drastically reduce the porting effort of legacy applications over to Spark.
There’s some great stuff coming out of DataBricks. Spark 2.0 looks to be an exciting product.
Roll your own
Ie write one T-SQL query to compare the structure of the two tables (this one will always be the same) and another to compare the data (new each time).
- You get practice in writing T-SQL (yes, I’m stretching there.)
Can take quite a bit of time to write. (Don’t forget to check for NULLs separately.)
Performance may not be spectacular.
Good comparison. Check it out.
Far too often then, the easy answer, just change the column to varchar. You can trim the time and output in exactly the format needed by the business. Problem solved and it was easy…
Well, until someone inputs “Janry 20, 2009” slightly mangling the spelling and suddenly your report looks all messed up. Or, they ask you to start filtering just the last two weeks, regardless of when the report was run and you now can’t easily dodate math on the column. Even after you get over that problem with a little formatting using CAST (along with ISDATE to try to catch all those other typos that are in the system now) you notice that the performance is really slow so you go to put an index on the column and now you have an index key that is 50 bytes wide instead of the 3 bytes that the DATE data type would have been, making the index less efficient (not to mention, sorting the data is going to put February ahead of January, more formatting).
Read the whole thing.