Month: September 2017

You’ll need to plan out your domain before you begin. In my case, I already had my network configured to use 192.168.254.x. My Fiber router serves as my default gateway as well as my DHCP server and primary DNS server for my local network. My wireless access points, primary workstation, and printer are already set up for static IP addresses. I have already set aside a subnet of addresses for static servers. I also already own a domain name (toyboxcreations.net). Having all this set up before trying to install my domain controller help by saving time.

Shannon glosses over the local AD part, but once that’s set up, shows how to tie it in with Azure Active Directory.

Comments closed

Will SQL Server On Linux Take Off?

Published 2017-09-07 by Kevin Feasel

Brent Ozar has his doubts about how popular SQL Server on Linux will be:

SQL Server 2017 runs on Linux, and the similarities between that and Windows Core are eerie:

Both present big stumbling blocks for traditional Windows DBAs

Both work mostly the same, but not exactly, as you can see in the SQL Server on Linux release notes

Both solved perceived problems for sysadmins

Neither solved a problem for database administrators

So why will you hear so much more about Linux support? Because this time around, it also solves a sales problem for Microsoft. Somebody, somewhere, has a spreadsheet showing that there will be a return on investment if they spend the development, marketing, and support resources necessary. (And I bet they’re right – if you compare this feature’s ROI against, say Hekaton or Polybase, surely Linux is going to produce a lot more new licenses sold.)

He does make some good points (though seriously, Polybase is awesome), but I think SQL Server on Linux is going to be quite a bit more popular for a couple of reasons. First is core-based licensing in Windows Server: that’s another big price increase that you get when upgrading to Server 2016, and at the margin, companies with a mixed OS setup will be more likely to move to Linux. Second, Brent’s focus in the post is on current installations—that is, taking your Windows SQL Server instance and moving it to Linux. As Koen Verbeeck mentions in the first comment, there’s a whole different market: companies whose infrastructure is entirely Linux and are currently using MySQL, Oracle, or Postgres for their relational databases. It’ll probably take a couple of years to get market penetration—especially because of the old guard Linux admin types who remember the Ballmer years with appropriate disdain—but this is a new market for Microsoft and they’ve already got a product which meets (or exceeds, depending upon your biases) the top competition.

Comments closed

How Functions Affect Data Retrieval Performance

Published 2017-09-07 by Kevin Feasel

Daniel Janik shows one of the many pain points around user-defined functions in SQL Server:

Note that Query 1 uses a function around the column and this causes a scan which increases IO and CPU utilization. This is because every value in the column for the whole table must have the hyphen removed to see if it is a match.

In Query 2, we see a seek. This is because the value is modified instead of the column.

Click through for a few examples.

Comments closed

Building An API To Read An API

Published 2017-09-07 by Kevin Feasel

Jesse Seymour shows how to build a WebAPI project to retrieve JSON data from another API:

In this file, our goal is to create a class library that connects to an API, authenticates, retrieves JSON formatted data, and deserializes to output for use in a SSIS package. In this particular solution, I created a separate DLL for the class library which will require me to register it in the global assembly cache on the ETL server. If your environment doesn’t allow for this, you can still use some of the code snippets here to work with JSON data.

Our order of operations will be to do the following tasks: Create a web request, attach authentication headers to it, retrieve the serialized JSON data, and deserialize it into an object. I use model-view-controller (MVC) architecture to organize my code, minus the views because I am not presenting the data to a user interface.

Read on for a depiction and all of the project code. Building a separate WebAPI project to retrieve this data is usually a good move, as you gain a lot of flexibility: you can run it on cheaper hardware, schedule data refreshes, send the data out to different locations, and so on.

Comments closed

Set A Fill Factor

Published 2017-09-07 by Kevin Feasel

Monica Rathbun wants you to set a better fill factor than the default:

Please, please, please Admins do not leave your default index fill factor at 0. This means you are telling SQL Server to fill the page 100% full when creating indexes. This also means you are forcing it to a new page when additional inserts are done. These are called PAGE SPLITS which can take time to perform and is a resource intensive operation. Having a high fill factor will cause more index fragmentation, decrease performance and increase IO.

If you find that this is how your system is configured, all is not lost. You can correct this by changing the default value so that new indexes will be created with proper factor and rebuilding your existing indexes with another fill factor value. I like to use 80 across the board for most, of course there is always the “it depends” scenario that arises but 80 is a pretty safe bet. One of those “it depends” would be on logging table that has the correct clustering key and never gets updates in between values (make sense?), I don’t want a fill factor of 80. I’d want 0/100 to maximize page density as page splits wouldn’t occur if the clustered key is monotonically increasing.

Monica also has a couple scripts, one for changing the across-the-board default and one for changing a particular index.

Comments closed

Retrieving Disk Block Size With Powershell

Published 2017-09-07 by Kevin Feasel

Naveen Kumar shows how to use Powershell to find out the block size of a particular disk in Windows:

Do you need to worry about disk block size?
I would suggest you to read below articles for getting better understanding on this topic

Disk Partition Alignment Best Practices for SQL Server
WHAT IS SQL SERVER’S IO BLOCK SIZE?

Post discussion, the next question was how do we check the disk block size for a given server?
You can do it from command line using FSutil utility. But let’s do it with PowerShell.

Read on for the code.

Comments closed

Handling Missing Data In Spark

Published 2017-09-06 by Kevin Feasel

Igor Sorokin explains how to implement DataFrameNaFunctions:

Unfortunately, C&P comes in to play, therefore, if at some point in time a default value for ‘trackLength’ is also required, you may end up changing both of these methods. Another disadvantage is that if another similar method, which requires the same default values, is added, code duplication is unavoidable.

A possible solution, which helps to reduce boilerplate, is DataFrameNaFunctions, which is intended to be used for handling missing data: replacing specific values, dropping ‘null’ and ‘NaN’, and setting default values

Read on for an example.

Comments closed

Troubleshooting Ambari Server

Published 2017-09-06 by Kevin Feasel

Jay SenSharma has an interesting article on troubleshooting Ambari Server:

When we notice that the ambari server is responding slow then we should look first the following details first:

1). The number of hosts added to the ambari cluster. So that accordingly we can tune the ambari agent thread pools.

2). The number of concurrent users (or the view users) who access the ambari server at a time. Sothat accordingly we can tune the ambari thread pools.

3). The age of the ambari cluster. If the ambari server is too old then the possibility is that some of the operational logs and the alert histories will be consuming a large amount of the Database which might be causing ambari DB queries to respond slow.

4). The Ambari Database health and it’s geographic location from the ambari server, to isolate if there are any network delays.

5). Ambari server memory related tuning parameters to see if the ambari heap is set correctly.

6). For ambari UI slowness we should check the network proxy issues to see if there are any network proxies added between client the ambari server machine Or the network slowness.

7). If the ambari users are synced with the AD or external LDAP and if the communication between server and the AD/LDAP is good.

8). Also the resource availability on the ambari host like the available free memory and if any other service/component running on ambari server is consuming more Memory/CPU/IO.

There is a lot of detail here, including quite a few checks to run.

Comments closed

Parallelism In R

Published 2017-09-06 by Kevin Feasel

Florian Prive shows off a few methods for parallelizing code in R:

Parallelize with foreach

You need to do at least two things:

replace %do% by %dopar%. Basically, always use %dopar% because you can use registerDoSEQ() is you really want to run the foreach sequentially.
register a parallel backend using one of the packages that begin with do (such as doParallel, doMC, doMPI and more). I will list only the two main parallel backends because there are too many of them.

Check it out. Florian spends a lot of time with foreach and doParallel, a little bit of time with flock, and mentions Microsoft R Open. H/T R-Bloggers

Comments closed

Dealing With Trace Flags In The Registry

Published 2017-09-06 by Kevin Feasel

Wayne Sheffield shows us how to configure SQL Server trace flags within T-SQL using registry access commands:

In a recent post, I introduced you to how to work with the registry directly from within SQL Server. Continuing this theme, this post provides an example situation where you would do so.

In this example, we will want to configure SQL Server to enable a few trace flags (TF) when SQL Server starts. Specifically, let’s set trace flags 1117 and 1118 so that they are enabled when SQL Server starts up, and enable them now. Additionally, we have trace flags 1204 and 1222 now enabled on some servers, and we want to disable those (since we have the deadlocks being captured in the system health XE, we don’t need them in the error log also). We also don’t want to force a restart of the SQL Server services.

I’ve always felt a little icky about writing to the registry from SQL Server, but Wayne shows how to do it right.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30