Explaining Neural Networks With H2O

Shirin Glander explains some of the concepts behind neural networks using H2O as a guide:

Before, when describing the simple perceptron, I said that a result is calculated in a neuron, e.g. by summing up all the incoming data multiplied by weights. However, this has one big disadvantage: such an approach would only enable our neural net to learn linearrelationships between data. In order to be able to learn (you can also say approximate) any mathematical problem – no matter how complex – we use activation functions. Activation functions normalize the output of a neuron, e.g. to values between -1 and 1, (Tanh), 0 and 1 (Sigmoid) or by setting negative values to 0 (Rectified Linear Units, ReLU). In H2O we can choose between Tanh, Tanh with Dropout, Rectifier (default), Rectifier with Dropout, Maxout and Maxout with Dropout. Let’s choose Rectifier with Dropout. Dropout is used to improve the generalizability of neural nets by randomly setting a given proportion of nodes to 0. The dropout rate in H2O is specified with two arguments: hidden_dropout_ratios, which per default sets 50% of hidden (more on that in a minute) nodes to 0. Here, I want to reduce that proportion to 20% but let’s talk about hidden layers and hidden nodes first. In addition to hidden dropout, H2O let’s us specify a dropout for the input layer with input_dropout_ratio. This argument is deactivated by default and this is how we will leave it.

Read the whole thing and, if you understand German, check out the video as well.

Detecting Redirects With httr

Peter Meissner shows us how we can find redirects when using the httr package:

I am the creator and maintainer of the robotstxt package an R package that enables users to retrieve and parse robots.txt files and ultimately is designed to do access permission checking for web resources.

Recently a discussion came up about how to interpret permissions in case of sub-domains and HTTP redirects. Long story short: In case of robots.txt files redirects are suspicious and users should at least be informed about it happening so they might take appropriate action.

So, I set out to find a way to check whether or not a robots.txt files requested via the httr package has gone through one or more redirects prior to its retrieval.

Click through for the solution.

Premium Blob Storage In Azure

James Serra describes a new tier of Azure Blob Storage:

As a follow-up to my blog Azure Archive Blob Storage, Microsoft has released another storage tier called Azure Premium Blob Storage (announcement).  It is in private preview in US East 2, US Central and US West regions.

This is a performance tier in Azure Blob Storage, complimenting the existing Hot, Cool, and Archive tiers.  Data in Premium Blob Storage is stored on solid-state drives, which are known for lower latency and higher transactional rates compared to traditional hard drives.

It is ideal for workloads that require very fast access time such as interactive video editing, static web content, and online transactions.  It also works well for workloads that perform many relatively small transactions, such as capturing telemetry data, message passing, and data transformation.

It’s in private preview for now, but my guess is that it’ll be available to the general public soon enough.

Polybase And Azure Data Studio

Rajendra Gupta continues his series on Polybase in SQL Server 2019 with a look at Polybase integration in Azure Data Studio:

We have learned earlier that PolyBase in SQL Server 2019 Preview allows access to various data sources such as SQL Server, Oracle, MongoDB, Teradata, and ODBC based sources etc. Azure Data Studio SQL Server 2019 preview extension currently supports for SQL Server and Oracle data sources only from the External table wizard.

In this series, we will create an external table for SQL Server and explore some more features around it.

Launch Azure Data Studio and connect to the SQL Server 2019 preview instance. Right click on the database and launch ‘Create External Table’.

Rajendra also looks at some of the Polybase DMVs and the notion of predicate pushdown, which is critical to understand for writing Polybase queries which perform well.

The Performance Impacts Of Query Store

Erin Stellato explains the performance impacts of enabling Query Store in various types of environments:

The short answer:

  • The majority of workloads won’t see an impact on system performance

    • Will there be an increase in resource use (CPU, memory)?  Yes.
    • Is there a “magic number” to use to figure out Query Store performance and the increase in resource use?  No, it will depend on the type of workload.  Keep reading.
  • An impact on system performance can be seen with ad-hoc workloads (think Entity Framework, NHibernate), but I still think it’s worth enabling. With an ad-hoc workload there are additional factors to consider when using Query Store.

  • You should be running the latest version CU for SQL Server 2017 and latest CU for SQL Server 2016 SP2 to get all performance-related improvements Microsoft has implemented specific to Query Store

Definitely read the long answer.  There are also settings to reduce the load that Query Store puts on a system, and being up to date is critical.

SQL Server IaaS Versus PaaS On AWS

John McCormack identifies some differences between running SQL Server in EC2 versus RDS on Amazon Web Services:

How do I run SQL Server on AWS?

Running SQL Server on AWS can be done in 2 ways.

  • Relation Database Service (RDS): AWS’s managed solution where some of the administration (maintenance, backups and patching) is handled for you.

  • EC2: Your very own virtual machine in the cloud. With EC2, you manage SQL Server, just like you would do on-premises. This gives you full control over your SQL instance.

Click through for the comparison.


November 2018
« Oct