4. Spark-dotnet Java driver listens on tcp port
The spark-dotnet Java driver listens on a TCP socket. This socket is used to communicate between the Java VM and the dotnet code, the dotnet code doesn’t run in the Java VM but is in a separate process communitcating with the Java VM via that TCP postrt. The year is 2019, we serialize and deserialize data all the time and don’t even know it, hell notepad probably even does it.
It’s serialization & deserialization as well as TCP sockets all the way down.
Keras is a high-level neural networks API, written in Python and capable of running on top of Tensorflow, CNTK or Theano. It was developed with a focus on enabling fast experimentation. In this blog, we are going to cover one small case study for fashion mnist.
Fashion-MNIST is a dataset of Zalando’s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28×28 grayscale image, associated with a label from 10 classes. Zalando intends Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.
The end result wasn’t that great, but Shubham was using a sequential model rather than a convolutional neural network, so you can probably take this as a starting point and improve upon it.
The Clustered Index Seek operator uses the structure of a clustered index to efficiently find either single rows (singleton seek) or specific subsets of rows (range seek). Because a clustered index always contains all columns in a table, a Clustered Index Seek is one of the most efficient ways for SQL Server to find single rows or small ranges, provided there is a filter that can be used efficiently.
The behavior of the Clustered Index Seek operator is in fact exactly the same as the behavior of the Index Seek operator, with only a very few differences as noted below. Though these two operators do have different names, not only in the graphical execution plan but also in the underlying XML, I suspect that in reality they are both using the same internal logic and not a copy of it.
Read the whole thing.
Like the desktop application, the Plan Explorer extension is designed to provide you with richer graphical execution plans for your real-time queries against SQL Server. It is based on a modest subset of functionality; we’ve started with just the plan diagram, a basic statements grid, tooltips, and access to the XML (so you can open the plan in other tools). We will add more features to the extension over time to try to get you as close to full parity with the desktop client as possible.
I gave it a quick try this weekend and had to pop XML results into the desktop client to get what I really wanted to see, but I’m excited over what this looks like medium-term.
Pareto Analysis is a statistical technique that applies the Pareto Principle to data. This is more commonly known as the 80:20 Rule. The Pareto Principle is based on the presumption that a relatively small number of inputs (20%) have most impact on the results/output (80%). The 80:20 rule can be applied to a wide variety of data in most businesses.
– Which 20% of products make up 80% of sales
– Which 20% of customers make up 80% of profit.
Pareto analysis is a rule-of-thumb technique but it does provide reasonably useful results much of the time.
This method is the much maligned recursive CTE method. In my testing it runs consistently faster with a lower memory grant but does cause a bit more IO to be performed. Some trade-off to be considered there. Both queries are returning the desired data-set which happens to be my missing question days. Only, I have added an extra output in the second query to let me know the day of the week that the missing question occurred on. Maybe I forgot to enter it because it was a weekend day or maybe I opted to not create one at all because the day lands on a Holiday. Let’s take a small peek at the results.
This is a good use for tally tables (or for a calendar table, which is basically a date dimension called something else so you can feel comfortable dropping in a non-warehouse system).
Connecting to your Hyperscale database is exactly the same as any other Azure SQL or SQL Server database – for example, you can use SQL Server Management Studio or Azure Data Studio. That is the exactly point. Hyperscale provides capabilities not found in other cloud databases such as scale and query performance, but it has no impact on your applications. If you do an in-place migration of a SQL database that is already on another service tier to Hyperscale, the connection string would not even change. Your application does not have to be changed and it will benefit from bigger scale and improved query performance. And who does not like the extra scale and query performance?
This post covers the basics of database creation but does not go much further into it than that. Jeroen does promise a multi-part series, however.
There is a slicer on the left with five items in it, a table showing the actual contents of the table (I’ve disabled visual interactions so the slicer doesn’t slice the table) with five rows and a card showing the output of the following measure:
Selected Number = SELECTEDVALUE(MyNumbers[Column1], "Nothing Selected")
In the screenshot above you can see I have selected the value 78 in the slicer and the measure – as you would expect – displays that value.
Now what happens when you refresh the dataset and the table contains a different set of numbers?
Read on for the full explanation.
In our previous article , we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data is involved.
However, when making predictions on individuals, a biased model may be preferable; biased models may be more accurate, or make predictions with lower relative error than an unbiased model. For example, tree-based ensemble models tend to be highly accurate, and are often the modeling approach of choice for many machine learning applications. In this note, we will show that tree-based models are biased, or uncalibrated. This means they may not always represent the best bias/variance trade-off.
Read on for an example.
The old Cloudera developed and distributed its Hadoop stack using a mix of open source and proprietary methods and licenses. But the new Cloudera will be 100% open source, just like Hortonworks, its one-time Hadoop rival that it acquired in January. But will developing its data platform completely in the open differentiate it from cloud competitors?
In a blog post published yesterday under the title “Our Commitment to Open Source Software,” Cloudera executives Charles Zedlewski and Arun Murthy laid out the company’s new plan to develop and distribute everything in the open.
This was one of the big reasons I preferred Hortonworks over Cloudera when they were separate companies: Hortonworks had this model. Hopefully it leads Cloudera to success.