The question is what is the right time period to use? The answer is it depends on the size of your partitions. Generally, for managed tables in U-SQL, you want to target about 1 GB per partition. So, if you are bringing in say 800 mb per day then daily partitions are about right. If instead you are bringing in 20 GB per day, you should look at hourly partitions of the data.
In this post, I’d like to take a look at two common scenarios that people run into. The first is full re-compute of partitions data and the second is a partial re-compute of a partition. The examples I will be using are based off of the U-SQL Ambulance Demo’s on Github and will be added to the solution for ease of your consumption.
The ability to reprocess data is vital in any ETL or ELT process.
When creating Apache Spark applications the basic structure is pretty much the same: for sbt you need the same
build.sbt, the same imports, and the skeleton application looks the same. All that really changes is the main entry point, that is the fully qualified class. Since that’s easy to automate, I present a couple of shell scripts that help you create the basic building blocks to kick-start Spark application development and allow you to easily upgrade versions in the configuration.
Check these out if you’re interested in Spark.
Good-bye, Business Intelligence Edition
The biggest surprise to me was the removal of the Business Intelligence edition that was initially introduced in SQL Server 2012. Truthfully, it never seemed to fit in the environments where I worked, so I guess it makes sense. Hopefully, fewer licensing options will make it easier for people to understand their licensing and pick the edition that works best for them.
2016 looks to be a great version for BI.
Learn why SQL Server’s table partitioning feature doesn’t make your queries faster– and may even make them slower.
In this 20 minute video, I’ll show you my favorite articles, bugs, and whitepapers online to explain where table partitioning shines and why you might want to implement it, even though it won’t solve your query performance problems.
Check out the video.
I am not taking into account mirroring or AGs. I honestly am not sure how that would affect the process.
Like any time you run DBCC SHRINKFILE this is going to shred your indexes. Take that into account and re-index as needed.
Kenneth shows screen shots, has a step-by-step checklist, and includes common errors. This is a great explanation.
In my circles, there are number of people who are complaining about the lack of features in standard edition. While I do agree that Always Encrypted should be in every version, as lack of strong data encryption is a problem that continues to confound IT. Putting Always Encrypted in all editions would be a good start to having wide ISV adoption of the Always Encrypted feature.
However, even without Always Encrypted, Microsoft added a LOT of new features to Standard Edition. Let’s list them (no specific order here):
There’s a pretty good amount of value in upgrading, even if you’re living on Standard Edition.
You are supposed to have pre-downloaded Windows Server Installation ISO image.
You can download Evaluation Windows Server from here: https://www.microsoft.com/en-us/evalcenter/evaluate-windows-server-technical-preview
For this example I’ve chosen Windows Server 2016 Technical Preview 5.
Note: Do not try to use 64-bit installation on 32-bit workstation. It won’t work.
After you specify the file click “Next”.
Read the whole thing.
Enable or disable PARAMETER_SNIFFING at the database level. Disable this option to instruct the query optimizer to use statistical data instead of the initial values for all local variables and parameters when the query is compiled and optimized. This is equivalent toTrace Flag 4136 or the OPTIMIZE FOR UNKNOWN query hint
Enable or disable QUERY_OPTIMIZER_HOTFIXES at the database level, to take advantage of the latest query optimizer hotfixes, regardless of the compatibility level of the database. This is equivalent to Trace Flag 4199
CLEAR PROCEDURE_CACHE which allows to clear procedure cache at the database level without impacting other databases and without requiring sysadmin permission. This command can be executed using ALTER ANY DATABASE SCOPE CONFIGURATION permission on the database, and the operation can be executed on the primary and/or the secondary
This is an early implementation of functionality, but I think this is a step in the right direction. Getting finer-grained and database-level configuration settings gets us one step closer to that 2012 dream of containerized databases.
A couple weeks ago I mentioned that we are using Trello to help the community collaborate about what we want next in SQLPS before we submit Connect items to Microsoft.
That effort is going very well. It’s going so well in fact that when the topic of getting some new improvements into SSMS was brought up, the SQL Tools team suggested that a Trello board to collaborate and prioritize what people want improved in SSMS would be very helpful to them. Ultimately Microsoft needs Connect items filed but using Trello helps folks to debate and combine ideas.
The cynic in me says “this is what Connect is supposed to do” but Aaron and Chrissy LeMaire had a great deal of success working with the SQLPS team, so here’s hoping they get traction here as well.
DacFx, or to give it it’s full title, the Data-tier Application Framework “is a component which provides application lifecycle services for database development and management for Microsoft SQL Server and Microsoft Azure SQL Databases“. Essentially, it is another method we can use to manage our Dacpacs. However instead of using the external process SQLPackage and initiating it via cmdline you can use C# or PowerShell to manage Dacpacs. In fact, SQLPackage uses the “Microsoft.SqlServer.Dac.dll” itself. You can verify this by going and deleting the dll and trying to run sqlpackage via command line…. or you can just take my word for it.
Read on for the Powershell script Richie uses.