2017-03-07 – Curated SQL

The first and simpler GetData() overload doesn’t show up in a vftable, but the second does. Oddly, the vftable for the second one lives at an offset of +0x1448 into the class instance – you’re going to have to trust me on this one. So the rcx passed into either variation will actually be the same one! But if the virtual version is called, it needs to find its position relative to +0x1448 dynamically, by doing a data lookup. We can confirm that by peeking at what is saved four bytes earlier at +0x1444, and that is indeed the value zero.

Ewald explains how this is vital to multiple inheritance and this post is only guaranteed to make your brain hurt a little bit.

Comments closed

Garbage Collection In Hadoop

Published 2017-03-07 by Kevin Feasel

Ranjan Banerjee explains how the Java garbage collector works, using Hadoop as an example:

The reason why we all love Java is due to the fact that we can be careless with memory creations and the work of cleaning the mess is performed by the JVM. On a high level, Java heap memory is classified into two phases:

1) Young (eden) space

2)Old space

The eden space is where newly created objects goto. There are various algorithms for garbage collection, but all of them try to first free memory from the young space and for those long lasting memory objects, they are transferred to the old space.

One common issue that can be noticed in running Map Reduce Applications are GC overhead limit exceeded.

Read on for more, including where you can find GC logs.

Comments closed

Time Series Errors

Published 2017-03-07 by Kevin Feasel

Alex Smolyanskaya explains some common errors when doing time series analysis:

Non-zero model error indicates that our model is missing explanatory features. In practice, we don’t expect to get rid of all model error—there will be some error in the forecast from unavoidable natural variation. Natural variation should reflect all the stuff we will probably never capture with our model, like measurement error, unpredictable external market forces, and so on. The distribution of error should be close to normal and, ideally, have a small mean. We get evidence that an important explanatory variable is missing from the model when we find that the model error doesn’t look like simple natural variation—if the distribution of errors skews one way or another, there are more outliers than expected, or if the mean is unpleasantly large. When this happens we should try to identify and correct any missing or incorrect model features.

It’s an interesting article, especially the bit about cross-validation, which is a perfectly acceptable technique in non-time series models.

Comments closed

Data Lake 3.0

Published 2017-03-07 by Kevin Feasel

Vinod Kumar Vavilapalli describes the modern data lake:

During the past few years though, end-to-end business use-cases have evolved to another level.

The end-to-end business problems are now mostly solved by multiple applications working together.

As the platform matured, users have increasingly started wanting to solely focus on the business application layers, and getting impatient to get on with developing their main business-logic.

However, YARN, and for that matter any other related platform, hasn’t catered to this evolving need, leaving the users to unwillingly get involved in the painstaking details of wiring applications together, keeping them up, manually scaling them as need arises etc.

Manual plumbing of all these different colored services in tiresome! Further, there is a clear need for seamless aggregate deployment, lifecycle management and application wireup. This is the gap that needs to be bridged between what these end-to-end business use-cases need from the platform and what the platform offers today. If these features are provided, then the business use cases authors can singularly focus on the business logic.

This is a higher-level “where are we at?” kind of post which could be helpful if you’re new to the data lake concept.

Comments closed

Disabling VMware In-Guest Clock Updates

Published 2017-03-07 by Kevin Feasel

David Klee explains when VMware will update the internal clocks for VMs and shows how to disable that:

These time sync actions can move a guest’s time backwards as well as forwards. More details about this conflict of settings are found in VMware KB1189. If the host time is out of sync, such as when a BIOS battery fails, bad things can happen. This action is extremely detrimental to the state of SQL Server high availability features, such as Availability Groups and Failover Cluster Instances, which depend on the in-guest time closely aligning with the Active Directory synchronized time. This action must be explicitly disabled to ensure that these maintenance items do not trigger an unexpected failover of the SQL Server HA solution. To disable this action, perform the following tasks.

I’d imagine that the ideal would be everything being synched to a single NTP source.

Comments closed

Retaining system_health Metrics

Published 2017-03-07 by Kevin Feasel

Taiob Ali looks at the system_health Extended Events session:

What this article does not tell you is your individual file size is 5 MB and number of maximum rollover file is 4. Meaning you will only get 20 MB of data. Once that limit is reached your older data is lost and there is no way to recover.

Often time you have an issue with the application pointing to SQL Server and you want to diagnose the problem with system_health files. You find that files already rolled over and information you are looking for is not available. I will explain how you can solve this problem and retain system_health session files for longer period.

Read on for the solution.

Comments closed

Using Prophet For Stock Price Predictions

Published 2017-03-07 by Kevin Feasel

Marcelo Perlin looks at Facebook’s Prophet to see if it works well for predicting stock price movements:

The previous histogram shows the total return from randomly generated signals in 10^{4} simulations. The vertical line is the result from using prophet. As you can see, it is a bit higher than the average of the distribution. The total return from prophet is lower than the return of the naive strategy in 27.5 percent of the simulations. This is not a bad result. But, notice that we didn’t add trading or liquidity costs to the analysis, which will make the total returns worse.

The main results of this simple study are clear: prophet is bad at point forecasts for returns but does quite better in directional predictions. It might be interesting to test it further, with more data, adding trading costs, other forecasting setups, and see if the results hold.

This is a very interesting article, worth reading. H/T R Bloggers

Comments closed

Copying Azure SQL Databases Between Subscriptions

Published 2017-03-07 by Kevin Feasel

Arun Sirpal shows that it’s pretty easy to copy an Azure SQL Database from one subscription to another:

If you ever need to move a copy of a SQL database in Azure across servers then here is a quick easy way.

So let’s say you need to take a copy of database called [Rack] within Subscription A that is on server ABCSQL1 and name it database [NewRack] within subscription B on server called RBARSQL1 (The SQL Servers are in totally different data centers too).

Read on for the answer.

Comments closed

Securing A Data Driven Application

Published 2017-03-07 by Kevin Feasel

K Brian Kelley has started a series on securing applications which connect to SQL Server:

In conjunction with the webinar I gave last month for MSSQLTips, I’ve started an article series on application database security design.

Read Part 1 – Authentication for SQL Server

The issue with a one hour webcast is one can’t cover a broad topic like application database security design in any depth. However, these webcasts are useful because they point out, at a high level, what to look for. It was my intent all along to do the webinar and follow up with a series of articles that cover each topic in detail. I’m not sure how many articles I’ll end up writing, as I want to make sure I cover each topic in the depth it needs while still keeping the article length manageable.

This first post is all about comparing and contrasting credentials options and authentication methods.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Day: March 7, 2017

Virtual Function Tables

Garbage Collection In Hadoop

Time Series Errors

Data Lake 3.0

Disabling VMware In-Guest Clock Updates

Retaining system_health Metrics

Using Prophet For Stock Price Predictions

Copying Azure SQL Databases Between Subscriptions

Securing A Data Driven Application