VGG16 is a built-in neural network in Keras that is pre-trained for image recognition.
Technically, it is possible to gather training and test data independently to build the classifier. However, this would necessitate at least 1,000 images, with 10,000 or greater being preferable.
In this regard, it is much easier to use a pre-trained neural network that has already been designed for image classification purposes.
This is probably the best generally available technique for image classification.
Today, we’re going to talk about combining Stream Analytics with Azure Machine Learning Studio within Power BI. If you haven’t read the earlier posts in this series, Introduction, Getting Started with R Scripts, Clustering, Time Series Decomposition, Forecasting, Correlations, Custom R Visuals, R Scripts in Query Editor, Python, Azure Machine Learning Studio and Stream Analytics, they may provide some useful context. You can find the files from this post in our GitHub Repository. Let’s move on to the core of this post, Stream Analytics.
This post is going to build directly on what we created in the two previous posts, Azure Machine Learning Studio and Stream Analytics. As such, we recommend that you read them before proceeding.
Read on for the demo.
There is an impedance mismatch between model development using Python, its tool stack and a scalable, reliable data platform with low latency, high throughput, zero data loss and 24/7 availability requirements needed for data ingestion, preprocessing, model deployment and monitoring at scale. Python in practice is not the most well-known technology for these requirements. However, it is a great client for a data platform like Apache Kafka.
Click through for the full argument as well as where Kafka can help mitigate some of the issues.
The original and most simple scenario of the Monty Hall problem is this: You are in a prize contest and in front of you there are three doors (A, B and C). Behind one of the doors is a prize (Car), while behind others is a loss (Goat). You first choose a door (let’s say door A). The contest host then opens another door behind which is a goat (let’s say door B), and then he ask you will you stay behind your original choice or will you switch the door. The question behind this is what is the better strategy?
This is something that puzzled me for a very long time. This is fundamentally a Bayesian problem built around processing new information, and once I understood that, the answer was a lot clearer. H/T R-Bloggers.
This bootcamp is a free online course for everyone who wants to learn hands-on machine learning and AI techniques, from basic algorithms to deep learning, computer vision and NLP. However, the course language is German only, but for every chapter I did, you will find an English R-version here on my blog (see below for links).
Right now, the course is in beta phase, so we are happy about everyone who tests our content and leaves feedback. Also, not the entire curriculum is finished yet, we will update and extend the course during the next months. If there are specific topics you’d like to have us cover, just let us know!
If you understand German and want to learn about data science, check this out and leave feedback.
There are so many things in statistics (and hence in econometrics) that are easily, and frequently, misinterpreted. Two really obvious examples are p-values and confidence intervals.
I’ve devoted some space in earlier posts to each of these concepts, and their mis-use. For instance, in the case of p-values, see the posts here and here; and for confidence intervals, see here and here.
Click through for more in this vein, including a reference to an interesting-looking paper.
The Gartner Magic Quadrant for Data Science and Machine Learning Platforms is just out and once again there are big changes in the leaderboard. Say what you will about our profession but as a platform developer you certainly can’t rest on your laurels. Some traditional leaders have fallen (SAS, KNIME, H2Oai, IBM) and some challengers have risen (Alteryx, TIBCO, RapidMiner).
Databricks is making a big push and there’s more movement than usual in this year’s chart. Check it out.
In the following tutorial, we answer both questions using the R package arulesSequences , which implements the SPADE algorithm . Concretely, given data in an Excel spreadsheet containing historical customer service purchase data, we produce two separate Excel sheet deliverables: a list of service bundles, and a set of temporal rules showing how service bundles evolve over time. We will focus on interpreting the latter result by showing how to use temporal rules in making predictive sales recommendations.
Our running example below is inspired by the need for Microsoft’s Azure Services salespeople to suggest which additional products to recommend to customers, given the customers’ current cloud product consumption services mix. We’d like to know, for instance, if customers who have implemented web services also purchase web analytics within the next month. Actual Azure Service names have been removed for confidentiality reasons.
Market basket analysis is an interesting topic, though in my limited experience, it really falls apart when you have a large number of products to compare, so it tends to work better with toy examples or limited product selections because when you have a 50,000+ SKU inventory, the lift of any individual combination of products rarely gets above the level of noise.
For this analysis I’m using the SAS Open Source library called SWAT (Scripting Wrapper for Analytics Transfer) to code in Python and execute SAS CAS Action Sets. SWAT acts as a bridge between the python language to CAS Action Sets. CAS Action Sets are synonymous to libraries in Python or packages in R. The one main difference and benefit is that the algorithms within these action sets have been highly parallelized to run on a CAS (Cloud Analytic Services) server. The CAS server is a distributed in-memory engine where I can do all my heavy lifting or computations. The code and Jupyter Notebook are available on GitHub.
Click through for the analysis.
The most important part of hypothesis testing is being clear what question we are trying to answer. In our case we are asking:
“Could the most extreme value happen by chance?”
The most extreme value we define as the greatest absolute AMVR deviation from the mean. This question forms our null hypothesis.
Give this one a careful read and try out the code. This is an important topic for anyone who analyzes data to understand.