Kevin Feasel – Page 1490

This is a quick post to share how we can use the coalesce operator in Azure DocumentDB (which is a schema-free, NoSQL database) to handle situations when the data structure varies from file to file. Varying data structure is a common issue in big data and analytics projects. A schema-free database like DocumentDB allows us to ingest and store the data with varying structures without a lot of upfront effort. However, accommodating these varying data structures is challenging later when we want to analyze the data. When querying the data (think Schema on Read here), I do need to impose a consistent structure on the data to perform analytics.

Read the whole thing.

Comments closed

Disk Space Shenanigans

Published 2017-03-16 by Kevin Feasel

Meagan Longoria writes about an outage due to improper file layout:

One day, a manager asked me if I could help on an urgent matter: the application suddenly could no longer execute transactions on the production database and the database connection was intermittently failing. The system admin was busy with other duties, so I was the closest thing they had to a DBA. All they could tell me was the production database had crashed and they got an error message about insufficient disk space.

Click through for the rest of the story.

Comments closed

Replication And TDE

Published 2017-03-16 by Kevin Feasel

Drew Furgiuele looks at how replication interacts with Transparent Data Encryption:

But what happens if we set up a transactional replication publication on this database and do a snapshot? Remember that when you create a publication, your distributor and subscriber(s) need to know which network share (or FTP server) to drop all the data and schema definitions to so they can be read in by the distribution agent and recreated. In my example, I’m dropping them to a network share. Once the snapshot completes, let’s go check out our subscriber database…

Uh oh. The same query returned zero results at the subscriber. Which means no encryption! Replication won’t replicate encryption, at all. So if you have a requirement to encrypt your data at the source, you’ll need to do it on your subscribers too.

Drew points out a couple important gotchas which might lead to you exposing information you didn’t intend to make available.

Comments closed

Stars Visual

Published 2017-03-16 by Kevin Feasel

Devin Knight shows off a Power BI custom visual which displays ratings with stars:

The Stars visual has the ability to use symbols instead of the star.
If you have multiple rows in your dataset then you may need to use a Slicer to toggle back and forth between each record.

I haven’t used the stars visual, but it seems that it’d make intuitive sense, given how many major sites use stars for ratings.

Comments closed

NULL Parameters

Published 2017-03-16 by Kevin Feasel

Riley Major turns this T-SQL Tuesday into thoughts on procedure parameterization:

But what if the caller wanted the date to be “empty” (i.e. 1900-01-01)? And what if a NULL is passed?

In our environment, we’ve disallowed NULLs from our table fields. We understand that NULL is actually information– it says that the data is unknown– but we believe that for most data fields, there are non-NULL values which just as effectively represent unknown. Typically, 0’s and empty strings (and the “blank” date 1900-01-01) serve that purpose. And those values are more forgiving during coding (they equal things; they don’t make everything else “unknown”), and we accept the risk of paying little attention to which parts of our logic touched “unknown” values.

It’s an interesting look at dealing with optional and default parameters within procedures.

Comments closed

Supersized Tables

Published 2017-03-16 by Kevin Feasel

Deborah Melkin tells a story of a design battle she lost:

The programmers came to me and said we need to add a large number of columns to this table for one piece of functionality. It would more than double the total number of columns on the table. Oh, and all of the new columns would be NULL since we would only need to populate them if they were using that functionality and even then, not all of them would require data. The final result would be that 65-75% of the table would end up having nullable fields with the majority of those having NULL for the value.

I said what I think any sane DBA would say to this request: No.

Click through for the rest of the tale.

Comments closed

The Guru

Published 2017-03-16 by Kevin Feasel

Adam Machanic wins this T-SQL Tuesday:

Back in my basement hideout, I spent the next couple of hours exploring the network and figuring out which server to connect to. The CTO was right; I did have enough access. I was sysadmin on the production SQL Server and had full admin access to the app server. I logged in to the app and with the help of a Profiler trace managed to figure out one of the main slow stored procedure calls that occurred any time someone saved a change via the user interface.

Pasting the procedure call into SSMS, I turned on Actual Execution Plan, hit F5, and got ready to see indications of a few missing indexes. I was ready to walk back upstairs, gloat to the CTO, and ask for a better workspace so I could continue to help. What I didn’t expect was what actually came back: Not one execution plan, or two, or three, but hundreds and hundreds. The scroll bar become progressively smaller as time clicked by and the elapsed counter did not stop running. All I’d done in the application was change the name of a single field. What was going on?

This was an amazing story full of cringe-worthy moments.

Comments closed

Creating A Spinlock Convoy

Published 2017-03-15 by Kevin Feasel

Lonny Niederstadt explains how two performance issues can intertwine in an interesting manner:

I’d seen systems that implemented both trace flags as startup parameters simultaneously. I’d helped organizations implement first T8048, then T4199 (based on the timing of my research and testing of the trace flags). This was the first time that there was a desire to implement the trace flags one-at-a-time and we had the choice of which to implement first.

I hadn’t chosen to put T8048 in first previously – that was just the way everything worked out. If I chose to follow that order – I’d be doing what I’d seen and done before. But… there was also a reason to choose the reverse order, with T4199 first. Spinlock issues – especially at that time – were considered more exotic performance issues than many of the “plan-shaping” issues that trace flag 4199 addressed. Many administrators were much more familiar with that type of performance issue – eliminating significant waits, altering plan shapes, making the logical work of queries more efficient – than with the busy wait/management overhead of spinlocks. Sometimes demonstrating an improvement that someone is already familiar with evaluating is a plus, helping to gain trust. I didn’t know of a specific reason NOT to put trace flag T4199 in place, followed by T8048 later. And in this case it seemed like building up some interpersonal capital might be a good idea.

Thinking through the full ramifications of trace flag changes is hard, even for sharp people like Lonny. Read on for the details of what happened next.

Comments closed

Always sa

Published 2017-03-15 by Kevin Feasel

James Anderson rails on inappropriate sa account usage:

I can already hear managers saying:

If you don’t trust your employees, why employ them in the first place?

Well there is the whole accidental damage thing. I guess you could cover that by having a good backup\restore process (if your RTO and RPO permitted the downtime) but don’t expect to pass any security audits coming your way. Hint: your clients wont like this.

Plus, supposing everybody knows the sa account, there’s no way to know who accidentally(?) dropped the customer database.

Comments closed

Understanding Neural Nets

Published 2017-03-15 by Kevin Feasel

David Smith links to a video which explains how neural networks do their thing:

In R, you can train a simple neural network with just a single hidden layer with the nnet package, which comes pre-installed with every R distribution. It’s a great place to start if you’re new to neural networks, but the deep learning applications call for more complex neural networks. R has several packages to check out here, including MXNet, darch, deepnet, and h2o: see this post for a comparison. The tensorflow package can also be used to implement various kinds of neural networks.

R makes it pretty easy to run one, though it then becomes important to understand regularization as a part of model tuning.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel

Coalescing In DocumentDB

Disk Space Shenanigans

Replication And TDE

Stars Visual

NULL Parameters

Supersized Tables

The Guru

Creating A Spinlock Convoy

Always sa

Understanding Neural Nets