Press "Enter" to skip to content

Month: July 2017

Neural Networks From Scratch

Ilia Karmanov explains neural nets and shows how to build one in R:

Hence, my motivation for this post is two-fold:

  1. Understanding (by writing from scratch) the leaky abstractions behind neural-networks dramatically shifted my focus to elements whose importance I initially overlooked. If my model is not learning I have a better idea of what to address rather than blindly wasting time switching optimisers (or even frameworks).
  2. A deep-neural-network (DNN), once taken apart into lego blocks, is no longer a black-box that is inaccessible to other disciplines outside of AI. It’s a combination of many topics that are very familiar to most people with a basic knowledge of statistics. I believe they need to cover very little (just the glue that holds the blocks together) to get an insight into a whole new realm.

Starting from a linear regression we will work through the maths and the code all the way to a deep-neural-network (DNN) in the accompanying R-notebooks. Hopefully to show that very little is actually new information.

This is pretty detailed.  Karmanov mentions Andrej Karpathy, whose Hacker’s guide to Neural Networks is also a must-read on the topic.

Comments closed

Schema-Only Optimized Tables Can Still Roll Back

Chris Adkin investigates whether schema-only memory-optimized tables are logged and whether they support transactions the way other tables do:

The statement “There is zero logging when DURABILITY=SCHEMA_ONLY” is not factually correct, its more like a minimally logged operation. What is surprising is the fact that logged as advertised for the in-memory engine should result in far fewer log records than the equivalent workload for the legacy engine, clearly this is not the case in this particular example and something I need to dig into somewhat deeper. Also note that the version of SQL Server being used is SQL Server 2016 SP1 CU3, which should be stable. One final point, in order to make sure that fn_dblog and fn_dblog_xtp produced clean results for me each time, I took the quick and dirty option of re-creating my test database each time.

This post definitely ranks in the “Microsoft did this right” category.

Comments closed

Running Totals With Window Functions

Bert Wagner shows the best method to calculate a running total in SQL Server 2012 or later:

Before SQL Server 2012, the solution to generating a running total involved cursors, CTEs, nested subqueries, or cross applies. This StackOverflow thread has a variety of solutions if you need to solve this problem in an older version of SQL Server.

However, SQL Server 2012’s introduction of window functions makes creating a running total incredibly easy.

Enhanced window functions was one of 2012’s killer features on the T-SQL developer side.  Bert’s post doesn’t cover window ranges and sizes, as the defaults work for him, but Steve Stedman has a good post on the topic if you want more details.

Comments closed

SUM() Or SUMX()

Matt Allington explains when to use SUM() or SUMX() in DAX:

Example: Total Sales SUMX = SUMX(Sales,Sales[Qty] * Sales[Price Per Unit])

SUMX() will iterate through a table specified in the first parameter, one row at a time, and complete a calculation specified in the second parameter, eg Quantity x Price Per Unit as shown in the example above for the current filter context.  Once it has done this for every row in the specified table in current filter context, it then adds up the total of all of the row by row calculations to get the total.   It is this total that is returned as the result.

This is a really good explanation of the topic.

Comments closed

Compressing Files In SSIS

Mark Broadbent shows how to compress files within Integration Services using a script task:

Ok, so before I get started I will caveat this quick post by saying that there is an easier (or preferred) way to perform compression on files in SSIS using the ZipFile Class in the recent version of the .NET framework, this is sadly not available if you are using any legacy deployments.

While I have not spent ages adapting the following code to perform what should be a simple thing to do, I have spent enough time to justify sharing what I have done so hopefully you will also find some benefit from the code below too.

Click through to see Mark’s code and explanation.

Comments closed

Linux Containers On Windows

Andrew Pruski shows how to run Linux-based Docker containers on Windows:

This post is a step-by-step guide to getting Linux containers running on your Windows 10 machine. The first thing to do is install the Docker Engine.

Installing Docker on Windows 10 is different than installing on Windows Server 2016, you’ll need to grab the Community Edition installer from the Docker Store.

Once installed, you’ll then need to switch the engine from Windows Container to Linux Containers by right-clicking the Docker icon in the taskbar and selecting “Switch to Linux Containers…”

Andrew walks us through step by step, so check it out.

Comments closed

Collecting Processes During High-CPU Events

Laerte Junior has a job which captures processes when server CPU is high:

Whatever the query you prefer to use, the big question will be how to do it in real time when the problem is actually happening, and log whatever information you need, even on the unattended server. There are plenty of times you need to do this, especially if you don’t have a full-time DBA or if you are running in the cloud and needs some support from the cloud provider. You can help the support Engineer by sending him the queries that are breaking your system. In AWS, this kind of service is out of scope of support, but if you have luck to find an Engineer that knows SQL Server and is willing to help you, as I was, it will, help him or her to help you to tune the queries. You just need to leave the solution and then get the CSV log with the queries.

If you can’t get a monitoring solution in place and have to roll your own, this is a very good piece of the puzzle.

Comments closed

Apache Drill Interface For R

Bob Rudis announces a new package on CRAN:

I’m extremely pleased to announce that the sergeant package is now on CRAN or will be hitting your local CRAN mirror soon.

sergeant provides JDBC, DBI and dplyr/dbplyr interfaces to Apache Drill. I’ve also wrapped a few goodies into the dplyr custom functions that work with Drill and if you have Drill UDFs that don’t work “out of the box” with sergeant‘s dplyr interface, file an issue and I’ll make a special one for it in the package.

Seems quite useful if you’re working with MapR.  H/T R-bloggers

Comments closed