Of course, PowerShell excels at this. By using the SQL Server module, it’s really easy to:
- Connect to an instance and collect every user database, and
- From each database, collect every table, and
- For each table, collect row counts and space used, and
- If there are any indexes, group them, and sum their usage and report that as well
Here’s the script. Note that I have the server name hard-coded in there as localhost (more on that in a coming paragraph). Go ahead and take a look before we break it down.
Click through for the script, and homework is due next Tuesday on his desk.
This is just great on its own as I get information shown below. It is basically data like this at the Database/FileGroup/File levels
Free of Max size
Size as a percent at the instance level
Free size as a percent at the instance level
Max size as a percent at the instance level
Free of Max size as a percent at the instance level
Read on to see how Jana makes use of this data, as well as where you can get the code.
There are advantages and disadvantages to each approach. The tiered approach has the most flexibility for an operator to tune the performance of the cluster while trading off size of the hot data zone for better performance or smaller resource footprint. The downside of this approach is that, having data on HDFS, resizing the cluster is a slow and tedious process due to HDFS needing to be rebalanced to achieve performance and fault-tolerance expectations. Thus this architecture is generally only used for statically sized clusters with steady, well-known workloads.
The decoupled architecture, on the other hand, enables maximum flexibility for cluster growth and reduction. For example, a cluster could run at 100 nodes during the day to support analytics and reporting and then shrink to 24 nodes overnight to support smaller ETL workloads. Historically, the disadvantage to decoupling is that cloud storage is not local and therefore could drastically affect runtime of the analytical workloads (hence the hybrid approach of tiered storage). However, the advent of LLAP in Hive 2.0 has limited this overhead making the approach far more attractive. The dynamic cache within LLAP also means that we do not need to statically define what data is hot. It can be inferred at query time (i.e., as users access the data, that data will become hot). We will look closer at how LLAP closes the runtime gap in the next section.
Historically, the argument was that you should avoid S3 in part because it’s relatively flaky compared to disks (in terms of performance and in its eventual consistency model). It looks like that’s no longer a pressing concern.
Arguably, the biggest thing about S2D is that the solutions currently have to be certified (see this bit of documentation from MS for more detail). This obviously doesn’t really affect, say, virtualized versions or ones up in the public cloud such as in Azure in a meaningful way, but it’s still technically a requirement much like logoed hardware for Windows Server supportability. Anyone want to point me to the logo stamped on your VMs? Didn’t think so. Now, from a pure FCI perspective none of this is an issue. The way a Windows Server failover cluster (WSFC) is currently designed, it is expecting that all nodes participating in the WSFC are also using/needed S2D. Why am I mentioning this? Disaster recovery.
It shouldn’t stop you from moving forward with S2D, but means you’ll probably have a bit more research on your hands.
What is Diskspd?
Diskspd is a storage testing tool created by Microsoft Windows, Windows Server and Cloud Server Infrastructure Engineering teams. It combines robust and granular IO workload definition with flexible runtime and output options. That makes it a perfect tool for storage performance testing, validation and benchmarking.
Where to find Diskspd?
Diskspd is a free and open source utility. Its source code can be found on GitHub. The repository also hosts other frameworks which use Diskspd. You can find them under ‘Frameworks’ directory. A binary release is hosted by Microsoft at the following location: http://aka.ms/diskspd.
Click through for more details, including an example of a poorly-performing I/O solution.
Next in our SQL Server on Linux series is one important question. On Windows, if you’re about to run out of space, you get your VM admin / storage admin to expand one or more of your drives, and you go to Disk Management and expand the drive with no downtime. How do we accomplish this same task on Linux?
First, SSH into your VM. Get your appropriate system engineer to expand the drive that needs to be expanded. You won’t be able to see it at first in Linux because, just like in Windows, it’ll need to rescan the storage to ‘see’ the extra space. Sometimes Windows does it automatically, and sometimes you have to initiate it manually. In Linux it only does this on system startup.
Let’s grow our data drive from 250GB to 300GB first.
Click through to see how to do that.
In my lab, I’m running Windows Server 2012. Let’s use Powershell to install the File System Resource Manager, which will allow us to create a quota for the relevant folder:
add-windowsfeature –name fs-resource-manager –includemanagementtools
After installing the Windows feature we can set the quota for the folder, but we shouldn’t enable it just yet, because first we have to verify the current size of the folder.
On my server, I created a quota of 1.5GB, and then enabled it.
Now let’s INSERT rows into the table, in batches of 1000, until we reach the limit (the INSERT script is listed in Part 2, I’m trying to keep this post from getting too long).
Click through to see what happens. It’s not exactly a swath of carnage, but it’s also something you really don’t want to happen.
I was working on a client’s site today, setting up database backup routines. Part of which is to perform a database backup, and verify that everything went okay. I had Windows Explorer open to the location that the backup was going to. When the backup finished, I navigated over to Windows Explorer… and I have a missing database backup. There wasn’t a file in the directory for the backup that I had just performed.
After double and triple checking that I was looking at the same path that I had backed up the database to, I went in search of the network sysadmin to help me figure it out.
Read on for the solution.
There were other potential issues when using Persistent Memory, detailed in this blog post. But what’s not covered in that post is the fact that deploying NVIDMM-N reduced the memory speed and/or capacity, because they are not compatible with LRDIMM. This causes you to use RDIMM, which reduces capacity, and because NVDIMM-N operates at a slower speed than RDIMM, it also affects total memory speed.
HP has since released Gen10 servers, and they have changed the landscape for those seeking reduced latency by storing larger data sets in memory. For one thing, they raise the bar for what’s now referred to as Scalable Persistent Memory, with a total server capacity of 1TB. To be clear, NVDIMM-N is not used in this configuration. Instead, regular DIMMs are used, and they are persisted to flash via a power source (this was also the case for NVDIMM-N, but both the flash, DIMM, and power source were located on the NVDIMM-N).
Check it out. I’m happy that things are improving, but it sounds like this won’t be a panacea.
It’s important to test your storage performance especially prior to installing or deploying a new SQL Server.
Microsoft has provided us with a great tool called Diskspd, which was meant to replace SQLIO. Diskspd synthetically generates workloads to run against your server. It’s pretty robust and has a lot of parameters so that you can customize your test.
Ex. In the command below, I specified -b8k, which means the block size is going to run at 8k, which is the size that SQL uses for pages.
Click through for a sample run and explanation of each parameter.