Distributed training with Keras 2 and MXNet
This article shows how to install Keras-MXNet and demonstrates how to train a CNN and an RNN. If you tried distributed training with other deep learning engines before, you know that it can be tedious and difficult. Let us show you what it’s like now, with Keras-MXNet.
Installation is only a few steps
Deploy an AWS Deep Learning AMI
The Deep Learning AMI is already set up for trial, so it should be easy to follow along.
There are a couple of myths that I see more an more these days. Like many myths they seem plausible on the surface but experienced data scientist know that the reality is more nuanced (and sadly requires more work).
- Deep learning (or Cognitive Analytics) is an easy button. You can throw massive amounts of data and the algorithm will deliver a near optimal model.
- Big data is always better than small data. More rows of data always results in a significantly better model than less rows of data.
Both of these myths lead some (lately it seems many) people to conclude that data scientist will eventually become superfluous. With enough data and advanced algorithms maybe we don’t need these expensive data scientists…
Read on for a dismantling of these myths. There’s a lot more than “collect all of the data and throw it at an algorithm” (and even then, “all” the data rarely really means all, which I think deserves to be a third myth). H/T R-bloggers
Container runtimes have security layers defined by Seccomp, Apparmor, kernel namespaces, cgroups, capabilities, and an unprivileged Linux user. All the layers don’t perfectly overlap, but a few do.
Let’s go over some of the ones that do overlap. I could do them all, but I would be here all day. The
mountsyscall is prevented by the default Apparmor profile, default Seccomp profile, and
CAP_SYS_ADMIN. This is a neat example as it is literally three layers. Wow.
Everyone’s favorite thing to complain about in containers or to prove that they know something is creating a fork bomb. Well this is actually easily preventable. With the PID cgroup you can set a max number of processes per container.
Interesting reading from an insider.
The authors study the computational complexity of GraphQL looking at three central questions:
- The evaluation problem: what can we say about the complexity of GraphQL query evaluation?
- The enumeration problem: how efficiently can we enumerate the results of a query in practice?
- The response size problem: how large can responses get, and what can we do to avoid obtaining overly large response objects?
In order to do this, they need to find some solid ground to use for reasoning. So the starting point is a formalisation of the semantics of GraphQL.
This is a review of a published academic paper rather than a how-to guide, so it’s math-heavy. I am enjoying seeing the development of normal forms for graph processing languages—it’s the beginning of a new generation of normalization purists.
I was recently working on a .NET 4.6 based project that was using EF 6 and nUnit for unit testing. While setting up some integration tests against a local SQL database I was receiving this error:
Spatial types and functions are not available for this provider because the assembly ‘Microsoft.SqlServer.Types’ version 10 or higher could not be found.
We had recently been using SQL Server spatial types for tracking geograpic locations and the tests which performed updates and inserts against these fields were failing.
Read on for the setup instructions.
I come across the need occasionally to deploy a set of sql files that are all checked into source control in different files with a file hierarchy like this:
- Database Name
- Type of object (proc, table, view, etc)
- Name of object
When I go to deploy the scripts I need to manually combine all the SQL files into one to move to production, qa or test for deployment. After getting annoyed at lots of copy and paste I finally discovered an easy powershell script to combine all the files into one.
Steve points out at the end that if the file does not end with “GO” then combining multiple things, like stored procedures, together might result in unexpected behavior. I’ve done something similar to Steve’s script, except as you stream the content, append a newline, “GO,” and another newline.
In this case, I’m creating a temporary stored procedure (out of laziness, it means I don’t have to clean up a quick demo) –CREATE OR ALTER PROCEDURE #testASIF 1=0 EXECUTE dbdoesnotexist.dbo.someproc;GO
The database dbdoesnotexist does NOT exist, but I’m still allowed to create the procedure.
When I do so, I get an informational message:
The module ‘#test’ depends on the missing object ‘dbdoesnotexist.dbo.someproc’. The module will still be created; however, it cannot run successfully until the object exists.
This can be useful in some cases where you’ll be querying a table or procedure that may not exist all the time, but which will exist when a certain code block is run.
But, as Kendra points out, deferred name resolution doesn’t work everywhere, so it’s important to know the rules around when it will or will not work.
I was creating some demo non-clustered indexes in one of my Azure SQL Databases and received the following warning when I executed this code:CREATE NONCLUSTERED INDEX [dbo.NCI_Time] ON [dbo].[Audit] ([UserId]) INCLUDE ([DefID],[ShopID])
Msg 10637, Level 16, State 3, Line 7
Cannot perform this operation on ‘object’ with ID 1093578934 as one or more indexes are currently in resumable index rebuild state. Please refer to sys.index_resumable_operations for more details.
Fortunately, the error message is clear and helpful, two terms which rarely go in conjunction with “error message.”
For beginners, the default configurations of the Kafka broker are good enough, but for production-level setup, one must understand each configuration. I am going to explain some of these configurations.
broker.id: The ID of the broker instance in a cluster.
zookeeper.connect: The ZooKeeper address (can list multiple addresses comma-separated for the ZooKeeper cluster). Example:
zookeeper.connection.timeout.ms: Time to wait before going down if, for some reason, the broker is not able to connect.
Read the whole thing.
Akhil Vijayan has a two-parter on serializing data in Scala. In the first post, he looks at uPickle:
uPickle serializer is a lightweight Json library for scala. uPickle is built on top of uJson which are used for easy manipulation of json without the need of converting it to a scala case class. We can even use uJson as standalone too. In this blog, I will focus only on uPickle library.
Note: uPickle does not support Scala 2.10; only 2.11 and 2.12 are supported
uPickle (pronounced micro-pickle) is a lightweight JSON serialization library which is fast than many other json serializers. I will talk more about the comparison of different serializers in my next blog. This blog will cover all the basic stuff about uPickle.
In my previous blog, I talked about how uPickle works. Now I will be comparing it will many other json serializers where I will be serializing and deserializing scala case class.
Before that let me discuss all the json serializers that I have used in my comparison. I will compare uPickle with PlayJson, Circe and Argonaut.
Check it out.