What we need to do is to offset the beginning of the data being stored on disk to a location more conducive to how the program is operating. This offset is known as the “Partition Alignment Offset”. To be in tune with SQL Server, this value should be an increment of 64KB. However, you also need to consider the entire storage subsystem – the disks, controllers and memory. Starting with Windows Server 2008, this offset is at 1024KB – a nice increment of 64KB that also works very nicely with most RAID disks/controllers. Prior to Windows Server 2008, partition alignment offset was not explicitly performed, so this will need to be performed.
If you’ve migrated disk from server to server to server over the years, this is worth checking out.
So 23 milliseconds for the batch version and 850 milliseconds for RBAR. What a difference.
Now in this case the code for the RBAR is also a lot more complicated. But that isn’t always the case. It also isn’t always the case that RBAR is slower. But it’s almost always a lot slower than batch.
So, while the code for RBAR is often easier to write, even though it might be physically longer, it’s probably going to be slower too.
Well-written, set-based solutions aren’t always guaranteed to be faster, but that’s one of the safest bets to make with T-SQL.
Over the last year, I have been intentionally seeking out to get feedback from the community via various SQL events, particularly those who plan to use or are currently using Azure SQL Database. A lot of questions have come up about managing Azure SQL Database better – i.e. being more proactive and more responsive in managing Azure SQL Database. One of the ways to be more proactive about your SQL Database is by setting up alerts. As an example, you can create an alert in case DTU goes above 95% – say in the last 5 minutes, so that you can either investigate why this might be or upgrade it to a higher SKU.
This article walks through how you can setup an Alert on Azure SQL DB.
I really like the fact that they offer web hooks; that way, I can integrate these alerts with Slack or other messaging systems.
SSDT is a VS plugin that can script out a database into individual files so that you can us a VCS (I use Git) to version control them. Once those scripts are in my Git repo, I can use it as the single source of truth to generate my releases from. This is the basis of getting our databases into our CI process. ReadyRoll will be used to further improve this process and to add our migration/upgrade scripts to our repo. SSDT is required by ReadyRoll and can be found here.
Before we can start with ReadyRoll, we need to learn some Visual Studio basics.
I’ve used database projects for the better part of a decade. They aren’t perfect but in most environments, they’re quite helpful…if other people use them as well…
Looking at the first 5 records of the RDD
This output is difficult to read. This is because we are asking PySpark to show us data that is in the RDD format. PySpark has a DataFrame functionality. If the Python version is 2.7 or higher, you can utilize the pandas package. However, pandas doesn’t work on Python versions 2.6, so we use the Spark SQL functionality to create DataFrames for exploration.
The full example is a fairly simple k-means clustering process, which is a great introduction to PySpark.
The main problem with an over-engineered solution is that it takes longer to ship than is necessary. By definition, we are doing more than is necessary, and that will take longer to ship. There’s almost never a reason to prefer longer ship-times over shorter ones, all things being equal.
The more serious problem with over-engineering is the carry cost.
A carrying cost is a cost the team bears for having to maintain software and infrastructure. Each feature requires tests, monitoring, and maintenance. Each new feature is made in the context of those that came before it. This is why a feature that might’ve taken one week when the project was new requires a month to make in more mature project.
Read the whole thing and simplify your solutions.
Based on this testing, lock contention, which usually results in a performance bottleneck and underutilized resources, was our first “suspect.” We knew that using a commercial Java profiler, such as Yourkit, JProfiler and Java Flight Recorder, would help easily identify locks and determine how much time threads spend waiting on them. Meanwhile, the team had built custom infrastructure that allows one to run experiments with a profiler attached via a single command-line parameter.
In my own testing, the profiler data indeed revealed some contention particularly related to
HdfsUpdateLoglocks, leading to long thread wait time. Although promisingly, this result corresponded somewhat to the description in SOLR-6820, nothing actionable resulted from the experiment.
I like these sorts of case studies because example is the school of mankind. In this particular case, I really like the methodical approach, using available information to search for a root cause. Some of the things Michael calls “false starts” I would consider to be initial steps: checking OS, filesystem, and garbage collection metrics are important even in a case like this in which they did not lead to the culprit, as they help you eliminate suspects.
Generates a call that returns 20 results, rather than the default 10:
Obviously these options make it easier to construct urls and the code is much clearer, but there are also other benefits to using these options which I’ll cover in another blog post soon.
This makes for a more maintainable, dynamic URL generation. Think about an internal product dashboard, where you might need to make API calls to pull in data by product (or maybe you want to send people to an external link for each product). This can help you parameterize your URLs quite easily.
The second issue is that even if you do install SSMS 2016, SQL Agent won’t recognize and give you access to the new module if you use a PowerShell job step. When you create a PowerShell job step, the script in that job step runs within a specific context. It’s hidden from you, but whenever that script runs the first thing that happens is SQL Server launchessqlps.exe.
Check out the links Mike provides to Connect items and the Trello board if you want to see the issues he brought up fixed.
Parameter sniffing fixes are based on your career progression with databases, and they go like this:
1. Reboot the server! – Junior folks panic and freak out, and just restart the server. Sure enough, that erases all cached execution plans. As soon as the box comes back up, they run rpt_Sales for China because that’s the one that was having problems. Because it’s called first, it gets a great plan for big data – and the junior admin believes they’ve fixed the problem.
2. Restart the SQL Server instance – Eventually, as these folks’ careers progress, they realize they can’t go rebooting Windows all the time, so they try this instead. It has the same effect.
If a reboot can’t fix the problem, I’m out of ideas…
By the way, I second Brent’s recommendation of Erland’s query plan article. Erland doesn’t publish frequently, but when he does it’s worth the wait.