2018-07-19 – Curated SQL

The Decorator Pattern

Published 2018-07-19 by Kevin Feasel

Nancy Jain explains the Decorator pattern:

Decorator design pattern is a structural design pattern.

Structural design patterns focus on Class and Object composition and decorator design pattern is about adding responsibilities to objects dynamically.

Decorator design pattern gives some additional responsibility to our base class.

This pattern is about creating a decorator class that can wrap original class and can provide additional functionality keeping class methods signature intact.

I don’t use the Decorator pattern as often as I probably should, but it can be quite useful.

Comments closed

When Paging To Disk Became Cool Again

Published 2018-07-19 by Kevin Feasel

The Netflix Technology Blog walks us through how they do caching on SSDs:

Storing large amounts of data in volatile memory (RAM) is expensive. Modern disk technologies based on SSD are providing fast access to data but at a much lower cost when compared to RAM. Hence, we wanted to move part of the data out of memory without sacrificing availability or performance. The cost to store 1 TB of data on SSD is much lower than storing the same amount in RAM.

We observed during experimentation that RAM random read latencies were rarely higher than 1 microsecond whereas typical SSD random read speeds are between 100–500 microseconds. For EVCache our typical SLA (Service Level Agreement) is around 1 millisecond with a default timeout of 20 milliseconds while serving around 100K RPS. During our testing using the storage optimized EC2 instances (I3.2xlarge) we noticed that we were able to perform over 200K IOPS of 1K byte items thus meeting our throughput goals with latency rarely exceeding 1 millisecond. This meant that by using SSD (NVMe) we were able to meet our SLA and throughput requirements at a significantly lower cost.

NVMe isn’t as fast as RAM, but we are well beyond the days of spinning disk hard drives.

Comments closed

Data Lakes And Data Swamps

Published 2018-07-19 by Kevin Feasel

Randolph West talks about data lakes:

Internet companies including search engines (Google, Bing), social media companies (Facebook, Twitter), and email providers (Yahoo!, Outlook.com) are managing data stores measured in petabytes. On a daily basis these organizations handle all sorts of structured and unstructured data.

Assuming they put all their data in one repository, that could technically be thought of as a data lake. These organizations have adapted existing tools, and even created new technologies, to manage data of this magnitude in a field called big data.

The short version: big data is not a 100 GB SQL Server database or data warehouse. Big data is a relatively new field that came about because traditional data management tools are simply unable to deal with such large volumes of data. Even so, a single SQL Server database can allegedly be more than 500 petabytes in size, but Michael J. Swart warns us: if you’re using over 10% of what SQL Server restricts you to, you’re doing it wrong.

Incidentally, I’ll note that the term data swamp has a storied history here at Curated SQL.

Comments closed

CLR_MANUAL_EVENT Waits

Published 2018-07-19 by Kevin Feasel

Jonathan Kehayias traces out the cause of CLR_MANUAL_EVENT waits on SQL Server:

The fact that no data has been collected for this type throughout a good cross-section of their customers really confirmed for me that this isn’t something that is commonly a problem, so I was intrigued by the fact that this specific workload was now exhibiting problems with this wait. I wasn’t sure where to go to further investigate the issue so I replied to the email saying I was sorry that I couldn’t help further because I didn’t have any idea what would be causing literally dozens of threads performing spatial queries to all of sudden start having to wait for 2-4 seconds at a time on this wait type.

A day later, I received a kind follow-up email from the person that asked the question that informed me that they had resolved the problem. Indeed, nothing in the actual application workload had changed, but there was a change to the environment that occurred. A third-party software package was installed on all of the servers in their infrastructure by their security team, and this software was collecting data at five-minute intervals and causing the .NET garbage collection processing to run incredibly aggressively and “go nuts” as they said. Armed with this information and some of my past knowledge of .NET development I decided I wanted to play around with this some and see if I could reproduce the behavior and how we could go about troubleshooting the causes further.

Read the whole thing if you use CLR.

Comments closed

Generating Dynamic Powershell With Script Blocks

Published 2018-07-19 by Kevin Feasel

Shane O’Neill walks us through the concept of script blocks in Powershell:

…recently, I ran into an issue in PowerShell that, if it had been in SQL, I would have solved it quite handily with some Dynamic SQL.

“Alas, this is PowerShell” I thought to myself. “And there is no way that one knows of that one can create dynamic commands that can be built up itself!“.

Now, there is two things that you have to realise for when I’m thinking to myself:

I think more fancy that I am in real life, and

I’m nearly always wrong!

So please see below for my example problem and the “dynamic PowerShell” created to overcome the issue!

Check it out, and then imagine how to perform Powershell injection.

Comments closed

T-SQL Tuesday 104 Roundup

Published 2018-07-19 by Kevin Feasel

Bert Wagner reviews the entries for T-SQL Tuesday 104:

This month’s T-SQL Tuesday topic asked “What code would you hate to live without?” Turns out you like using script and code to automate boring, repetitive, and error-prone tasks.

Thank you to everyone who participated; I was nervous that July holidays and summer vacations would stunt turnout, however we wound up with 42 posts!

Watch tsqltuesday.com for next month’s topic and consider signing up to host.

Read on for the 42 submissions.

Comments closed

Obfuscating Continuous Variables

Published 2018-07-19 by Kevin Feasel

Phil Factor continues his series on data obfuscation:

Imagine that you have a table giving invoice values. You will want your spoof data to conform with the same ups and downs of the real data over time. You may be able to get the overall distribution the same as the real data, but the resulting data would be useless for seeing the effect of last years sales promotion. The invoice values will depend on your sales promotions if your marketing people have done their job properly.

By making your data the same distribution as your production data, you don’t necessarily get the same strategy chosen by the query analyser, but you dramatically increase the chances of getting it. SQL Server uses a complex paradigm to select amongst its alternative plans for a query. It maintains distribution statistics for every column and index that is used for selecting rows. These aren’t actually histograms in the classic sense, but they perform a similar function and are used by the SQL Server engine to predict the number of rows that will be returned.

The focus is on independent variables, though there is a little bit at the end about working with dependencies.

Comments closed

Relationships In Power BI

Published 2018-07-19 by Kevin Feasel

Teo Lachev shows us the importance of defining relationships in Power BI:

However, If there isn’t a direct relationship between ResellerSales and Employee, the moment you add an unsummarized field from the second table on the many side, such as Employee[FullName] after adding SalesTerritoryCountry and ResellerSales[SalesOrderNumber), you’ll get the error “Error: Can’t determine relationships between the fields”.

Solution: Interestingly, the report works fine if a summarized field, such as COUNT(Employee[EmployeeKey]) is used. In this case, the SalesTerritory dimension acts as a conformed dimension joined to two fact tables. The reason why it doesn’t work when Employee[FullName] is added is because there is no aggregation on the Employee table and the relationship between ResellerSales[SalesOrderNumber] and Employee[FullName] becomes Many:Many over SalesTerritory which is now a bridge table. One employee may be associated with multiple sales and a sale can be associated with multiple employees. How do we solve this horrible problem?

Good data modeling is important, and Power BI dashboards are no exception to the rule.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Day: July 19, 2018