# Regularization Prevents Overfitting

2017-07-07

Assume that the red line is the regression model we learn from the training data set. It can be seen that the learned model fits the training data set perfectly, while it cannot generalize well to the data not included in the training set. There are several ways to avoid the problem of overfitting.

To remedy this problem, we could:

• Get more training examples.
• Use a simple predictor.
• Select a subsample of features.

In this blog post, we focus on the second and third ways to avoid overfitting by introducing regularization on the parameters βi of the model.

# Tidygraph

2017-07-07

One of the simplest concepts when computing graph based values is that of
centrality, i.e. how central is a node or edge in the graph. As this
definition is inherently vague, a lot of different centrality scores exists that
all treat the concept of central a bit different. One of the famous ones is
the pagerank algorithm that was powering Google Search in the beginning.
tidygraph currently has 11 different centrality measures and all of these are
prefixed with centrality_* for easy discoverability. All of them returns a
numeric vector matching the nodes (or edges in the case of
centrality_edge_betweenness()).

This is a big project and is definitely interesting if you’re looking at analyzing graph data.

# R’s iGraph + SQL Server Graphs

2017-07-07

The possibility to use both technologies together is very interesting. Using graph objects we can store relationships between elements, for example, relationships between forum members. Using R scripts we can build a cluster graph from the stored graph information, illustrating the relationships in the graph.

The script below creates a database for our example with a subset of the objects used in my article and a few more relationship records between the forum members.

Click through for the script.

# Drilldown Choropleth Power BI Custom Visual

2017-07-07

In this module you will learn how to use the Drilldown Choropleth Custom Visual.  The Drilldown Choropleth is a map visual that displays divided regions that are highlighted indicating the relative value in each location.

Click through for Devin’s video and example.

# Dynamic Unpivoting For Change Detection

2017-07-07

Overall, the script is longer at nearly double the lines but where it shines is when adding new columns.
To include new columns, just add them to the table; to exclude them, just add in a filter clause.

So, potentially, if every column in this table is to be tracked and we add columns all the way up to 1,024 columns, this code will not increase.
Old way: at least 6,144.
New way: at least 2,048.
Dynamic: no change

Read on for that script.  Even though his developer ended up not using his solution, Shane has made it available for the rest of the world so that some day, someone else can have the maintenance nightmare of trying to root out a bug in the process.

# System Health XE

2017-07-07

Per BOL you get the following information:

• Errors with a severity of >= 20.

• Memory related errors (Errors 17803, 701, 802, 8645, 8651, 8657 and 8902).

• Non-yielding scheduler problems (Error 17883).

• Sessions that have waited on locks for > 30 seconds.

• Sessions waiting for a long time on preemptive waits (waits on external API calls).

Read on to learn more of the things this session contains as well as a couple ways you can access the data.

# When You Need To Read Memory-Optimized Data From Disk

2017-07-07

Those who have studied In-Memory OLTP are aware that in the event of “database restart”, durable memory-optimized data must be streamed from disk to memory. But that’s not the only time data must be streamed, and the complete set of events that cause this is not intuitive. To be clear, if your database had to stream databack to memory, that means all your memory-optimized data was cleared from memory. The amount of time it takes to do this depends on:

• the amount of data that must be streamed

• the number of indexes that must be rebuilt

• the number of containers in the memory-optimized database, and how many volumes they’re spread across

• how many indexes must be recreated (SQL 2017 has a much faster index rebuild process, see below)

• the number of LOB columns

• BUCKET count being properly configured for HASH indexes

Read on for the list of scenarios that might cause a standalone SQL Server instance to need to stream data from disk into memory to re-hydrate memory-optimized tables and indexes.

# TempDB System Table Contention

2017-07-07

I ran this several times to see if there was a pattern to the madness, and it turned out it was. All waits were concentrated in database ID 2 – TEMPDB. Many people perk up by now and jump to the conclusion that this is your garden variety SGAM/PFS contention – easily remedied with more TEMPDB files and a trace flag. But, alas- this was further inside the TEMPDB. The output from the query above gave me the exact page number, and plugging that into DBCC PAGE gives the metadata object ID.

His conclusion is to reduce temp table usage and/or use memory-optimized tables.  We solved this problem with replacing temp tables with memory-optimized TVPs in our most frequently-used procedures.

# Avoid Ticks

2017-07-07

A .Net tick is a duration of time lasting 0.1 microseconds. When you look at the Tick property of DateTime, you’ll see that it represents the number of ticks since January 1st 0001.
But why 0.1 microseconds? According to stackoverflow user CodesInChaos “ticks are simply the smallest power-of-ten that doesn’t cause an Int64 to overflow when representing the year 9999”.

Even though it’s an interesting idea, just use one of the datetime data types, that’s what they’re there for. I avoid ticks whenever I can.

I agree with Michael:  avoid using Ticks if you can.

July 2017
MTWTFSS
« Jun Aug »
12
3456789
10111213141516
17181920212223
24252627282930
31