Press "Enter" to skip to content

Author: Kevin Feasel

HBase Compaction

Jitendra Bafna explains how HBase compaction works:

Compaction is a process by which HBase cleans itself. It comes in two flavors: minor compaction and major compaction.

Minor compaction is the process of combining the configurable number of smaller HFiles into one Large HFile. Minor compaction is very important because without it, reading particular rows requires many disk reads and can reduce overall performance.

Major compaction is a process of combining the StoreFiles of regions into a single StoreFile. It also deletes remove and expired versions. By default, major compaction runs every 24 hours and merges all StoreFiles into single StoreFile. After compaction, if the new larger StoreFile is greater than a certain size (defined by property), the region will split into new regions.

Read on for more information about compaction and data locality, which is a totally different topic.

Comments closed

The Value Of SQL Browser

Chris Sommer points out that the SQL Browser does something useful:

Bingo. The application server could not connect to the SQL Browser service on UDP 1434. So maybe now you’re asking why, and that’s kinda the gist of this post. The SQL Browser provides a valuable service when an application tries to connect to a SQL Server named instance. The SQL Browser listens on UDP 1434 and provides information about all SQL Server instances that are installed on the server. One of those pieces of information is the TCP port number that SQL is listening on. Without that info, the application has no idea how to reach to your SQL Server, and will fail to connect. This was our exact issue.

Do read this, though my preference is to shut off the SQL Browser because it’s a mechanism attackers can use for gathering intel on where SQL Server instances live.

Comments closed

Understanding Database Role Permissions

Jason Brimhall shows what happens when you make a user a member of every database role at the same time:

A fundamental component of SQL Server is the security layer. A principle player in security in SQL Server comes via principals. In a previous article, I outlined the different flavors of principals while focusing primarily on the users and logins. You can brush up on that article here. While I touched lightly, in that article, on the concept of roles, I will expound on the roles a bit more here – but primarily in the scope of the effects on user permissions due to membership in various default roles.

Let’s reset back to the driving issue in the introduction. Frequently, I see what I would call a gross misunderstanding of permissions by way of how people assign permissions and role membership within SQL Server. The assignment of role membership does not stop with database roles. Rather it is usually combined with a mis-configuration of the server role memberships as well. This misunderstanding can really be broken down into one of the following errors:

  • The belief that a login cannot access a database unless added specifically to the database.

  • The belief that a login must be added to every database role.

  • The belief that a login must be added to the sysadmin role to access resources in a database.

Worth reading.  Spoilers:  database roles are not like Voltron; they don’t get stronger when you put them all together.

Comments closed

SQL Authentication Accounts Without Password Policy

Chris Bell shows how to find accounts using SQL authentication and which do not have the “enforce password policy” flag set:

Recently I was performing a security audit for a client. One of the many things I had to check was the enforcement of password policies for any SQL Server created accounts.

You know, that policy that says you must have some combination of 6 or more characters, upper and lower case, a number, and special characters, etc.

These policies are controlled by the server policy settings and were something easy to check. The actual passwords and that they were safe, not so much.

Click through for the script.

Comments closed

AG Secondary Stats Overwritten With Sample

Taiob Ali has run into an interesting issue:

Once I update my statistics with fullscan, with in 10~20 seconds some of the statistics on the same table are getting update on secondary with a sample pecent of rows. Meaning my best statistics are being overwritten with good (full vs sample) statistics. On primary node once I run “Update statistics Tablename with fullscan” . I see following about statistics status.

After 10~20 seconds of updating statistics in primary node if I check the status of the same on my secondary nodes, I see fullscan statistics is replaced by sample statistics. Look at the rows_sampled and last_updated column, you will see the sample row number and last_updated column time is within few seconds of update in primary. RowsModified column still showing zero records.

It’s happening on an Availability Group secondary.  Taiob has a workaround, so read on for that.

Comments closed

Powershell Errors

Shane O’Neill explains what the $error variable does:

Now ignoring the fact that you already know what is wrong, this tells me that there is either something wrong with the $Database variable, the $sql variable or the syntax statement. Maybe even something else though!
This is not helpful and I’m going to have a bad time.

I encountered this lately and thanks to Chrissy LeMaire ( b | t ), I was introduced to the $error variable.

Shane also has an interesting side note around error colors.

Comments closed

Understanding The Stack

Ewald Cress digs into stack frames, using a few SQL Server functions as examples:

The rdi register is used as a scratchpad to save the incoming rcx value for later – first it is used to restore the possibly clobered rcx value for the second child function call, and then it becomes the return value going into rax. However, since we don’t want to clobber its value for our caller, we first save its old value and then, at the very end, restore it. This is done through a traditional push/pop pair, which in x64 is only allowed in the function prologue and epilogue.

Similarly, ebx (the bottom half of rbx) is used to save the incoming value of the second parameter in edx. However, here we don’t use a push/pop pair, but instead save it in a slot in the shadow space. Why two different mechanisms for two different registers? Sorry, that’s one for the compiler writers to answer.

There’s a lot of detail involved here, but it’s well worth the read.

Comments closed

Figuring Out Cost Threshold For Parallelism

Grant Fritchey uses R to help him decide on a good cost threshold for parallelism value:

With the Standard Deviation in hand, and a quick rule of thumb that says 68% of all values are going to be within two standard deviations of the data set, I can determine that a value of 16 on my Cost Threshold for Parallelism is going to cover most cases, and will ensure that only a small percentage of queries go parallel on my system, but that those which do go parallel are actually costly queries, not some that just fall outside the default value of 5.

I’ve made a couple of assumptions that are not completely held up by the data. Using the two, or even three, standard deviations to cover just enough of the data isn’t actually supported in this case because I don’t have a normal distribution of data. In fact, the distribution here is quite heavily skewed to one end of the chart. There’s also no data on the frequency of these calls. You may want to add that into your plans for setting your Cost Threshold.

This is a nice start.  If you’re looking for a more experimental analysis, you could try A/B testing (particularly if you have a good sample workload), where you track whatever pertinent counters you need (e.g., query runtime, whether it went parallel, CPU and disk usage) under different cost threshold regimes and do a comparative analysis.

Comments closed

Standard Edition Max Server Memory Changes In 2016 SP1

Randolph West details the changes in max server memory for Standard Edition in SQL Server 2016 SP1:

The memory limit of 128GB RAM applies only to the buffer pool (the 8KB data pages that are read from disk into memory — in other words, the database itself).

For servers containing more than 128GB of physical RAM, and running SQL Server 2016 with Service Pack 1 or higher, we now have options.

Randolph has a couple good clarifications on memory limits outside the buffer pool, making this worth the read.

Comments closed