In this blog post, I will discuss the use of deep leaning methods to classify time-series data, without the need to manually engineer features. The example I will consider is the classic Human Activity Recognition (HAR) dataset from the UCI repository. The dataset contains the raw time-series data, as well as a pre-processed one with 561 engineered features. I will compare the performance of typical machine learning algorithms which use engineered features with two deep learning methods (convolutional and recurrent neural networks) and show that deep learning can surpass the performance of the former.
I have used Tensorflow for the implementation and training of the models discussed in this post. In the discussion below, code snippets are provided to explain the implementation. For the complete code, please see my Github repository.
Click through for the samples, or check out the repo, linked above.
The model is a fully connected neural network with three hidden layers, with a ReLU as the activation function. They state that data from Google Compute Engine was used to train the model (implemented in TensorFlow), and Cloud Machine Learning Engine’s HyperTune feature was used to tune hyperparameters.
I have no reason to doubt their representation choices or network design, but one thing looks odd. Their output is two ReLU (rectifier) units, each emitting the network’s accuracy (technically: recall) on that class. I would’ve chosen a single Softmax unit representing the probability of Large Loss driver, from which I could get a ROC or Precision-Recall curve. I could then threshold the output to get any achievable performance on the curve. (I explain the advantages of scoring over hard classification in this post.)
But I’m not a neural network expert, and the purpose here isn’t to critique their network design, just their general approach. I assume they experimented and are reporting the best performance they found.
Read the whole thing.
Armed with our new knowledge, we can create a single SQL query that decodes all of the SSNs. The strategy is to define a single CTE with all ten digits and to use one
CROSS APPLYfor each digit in the SSN. Each
CROSS APPLYonly references the SSN column in the
WHEREclause and returns the matching prefix of the SSN that we’ve found so far. Here’s a snippet of the code:
Click through for progressively faster solutions. This is the main reason I do not care for DDM as a feature. Its main benefit seems to be preventing shoulder-surfing on reports; any concerted attacker with a little bit of access to writing queries can subvert it.
Parsing strings is a feature that is often needed in the database world and SUBSTRING/SUBSTR are designed to do just that. I find it interesting how these two platforms approached the functions differently and that’s definitely shows how you can do many things to get to the same answer.
It’s a short post, but Daniel does show one big difference between the Oracle and SQL Server substring functions.
But that comes with a few big drawbacks. They’re really well-documented, but here’s the highlights:
Do you need to query that data from other apps? Do you have a data warehouse, reporting tools, PowerBI, Analysis Services cubes, etc? If so, those apps will also need to be equipped with the latest database drivers and your decryption certificates. For example, here’s how you access Always Encrypted data with PowerBI. Any app that expects to read the encrypted data is going to need work, and that’s especially problematic if you’re replicating the data to other SQL Servers.
Click through to read the rest. Always Encrypted was designed to encrypt a few columns, not everything in a database.
wait_started_ms_ticks is set in SOS_Task::PreWait(), i.e. just before actually suspending, and again cleared in SOS_Task::PostWait(). For more about the choreography of suspending, see here.
wait_resumed_ms_ticks is set in SOS_Scheduler::PrepareWorkerForResume(), itself called by the mysteriously named but highly popular SOS_Scheduler::ResumeNoCuzz().
start_quantum is set for the Resuming and InstantResuming case within SOS_Scheduler::TaskTransition(), called by SOS_Scheduler::Switch() as the worker is woken up after a wait.
Ewald intends this post as an extension of the official documentation, so it’s best to read that documentation in conjunction with this post.
One of the questions I get when teaching others how to use Biml is how do you deal with sensitive information like usernames and passwords in your Biml Solution. No one wants to leave this information in plain text in a solution. You need access to it while interrogating your sources and destination connections for metadata. You also need it while Biml creates your SSIS packages since SSIS uses SELECT to read the metadata during design time to gather its metadata. If you lock away that sensitive information too tightly, you won’t be effective while building your solutions.
In the end, you’ll have to compromise between security and efficacy.
Read on for more.
One great way to introduce default values in Biml would be variables in include files or code files for example. But depending on what you’re trying to achieve or at what point you realize it, it may already be causing some extra work.
For example: You have a couple of diffent ways to create a dataflow task but in the end, they should all share a property like DefaultBufferMaxRows.
In BimlStudio, you could make use of a transformer, but these are not available in BimlExpress.
As a bonus, this is a bilingual post on two fronts, so you can pick up a little English-German translation as well as a little VB.Net-C# translation.