Kevin Feasel – Page 1540

Monitoring Car Data With Spark And Kafka

Published 2017-01-06 by Kevin Feasel

Carol McDonald builds a model to determine where Uber cars are clustered:

Uber trip data is published to a MapR Streams topic using the Kafka API. A Spark streaming application, subscribed to the topic, enriches the data with the cluster Id corresponding to the location using a k-means model, and publishes the results in JSON format to another topic. A Spark streaming application subscribed to the second topic analyzes the JSON messages in real time.

This is a fairly detailed post, well worth the read.

Comments closed

Web App Security

Published 2017-01-06 by Kevin Feasel

Vishwas Parameshwarappa has an article on securing web applications:

The Cross-site request forgery (CSRF) exploit uses cross-site scripting (mentioned above), browser insecurities, and other techniques to cause a user to unwittingly perform an action within their current authenticated context that allows the attacker to access the user’s account. This type of attack usually occurs when a malicious email, blog, or a message causes a user’s Web browser to perform an unwanted action on a trusted site for which the user is currently authenticated.

This is a nice overview of the most common attack vectors for web applications.

Comments closed

Deleting SSAS Partitions

Published 2017-01-06 by Kevin Feasel

Chris Koester shows how to use TMSL and Powershell to delete an Analysis Services tabular model partition:

The sample script below shows how this is done. The sequence command is used to delete multiple partitions in a single transaction. This is similar to the batch command in XMLA. In this example we’re only performing delete operations, but many different operations can be performed in sequence (And some in parallel).

Click through for a description of the process as well as a script to do the job.

Comments closed

Always Encrypted With Powershell

Published 2017-01-06 by Kevin Feasel

Jakub Szymaszek shows how to configure Always Encrypted support from Powershell:

Note: In a production environment, you should always run tools (such as PowerShell or SSMS) provisioning and using Always Encrypted keys on a machine that is different than the machine hosting your database. The primary purpose of Always Encrypted is to protect your data, in case the environment hosting your database gets compromised. If your keys are revealed to the machine hosting the database, an attacker can get them and the benefit of Always Encrypted will be defeated.

That’s a good warning.

Comments closed

Azure Resource Explorer

Published 2017-01-06 by Kevin Feasel

Kenneth Fisher discusses the Azure Resource Explorer:

Now this is just default tab. The GET, PUT tab. Which basically shows you the get command of the resource manager API that calls this information, and if you hit the edit button you can actually change information in the JSON output and issue a PUT command to send it back. I’ll admit up front that this is a bit beyond me as I don’t do API calls and I’m new enough to Azure that I don’t know what I can and can’t change (everything I’ve tried so far hasn’t worked). There are several other tabs, though, including a Powershell one and I’m a bit more familiar with Powershell. In it, you can see some of the Powershell commands associated with the resource manager and this particular object.

Read on for more information.

Comments closed

Columnstore Partitioning

Published 2017-01-06 by Kevin Feasel

Niko Neugebauer warns against partitioning small tables with clustered columnstore indexes:

Needless to say that looking at the execution plans you notice that the actual execution plan shows 10 times difference between them, even though both tables contain the very same data!
The query cost for the partitioned table is staggering – it is around 10 times bigger (~8.8) vs (~0.81) for the first query.
The execution times reflect in part this situation: 12 ms vs 91 ms. Non-partitioned table performs almost 9 times faster overall and the spent CPU time is reflecting it: 15 ms vs 94 ms. Remember, that both tables are Columnstore Indexes based ! Partitioning your table in a wrong way will contain a huge penalty that might not be directly detectable through the execution plan of the complex queries. Well, you might want to use the CISL, just saying

If you can’t fill a single rowgroup, your partition is too granular. Even then, I’d like to see double-digit rowgroups per partition, though that’s just me.

Comments closed

Polybase And Azure SQL Data Warehouse

Published 2017-01-05 by Kevin Feasel

I have a post on using Polybase with Azure SQL Data Warehouse:

That’s a header row, and I’m okay with it not making its way in. As a quick aside, I should note that I picked tailnum as my distribution key. The airplane’s tail number is unique to that craft, so there absolutely will be more than 60 distinct values, and as I recall, this data set didn’t have too many NULL values. After loading the 2008 data, I loaded all years’ data the same way, except selecting from dbo.Flights instead of Flights2008.

Click through for more details, including the CETAS statement, which I’d love to see in on-prem SQL Server.

Comments closed

Filtered Indexes And Parameters

Published 2017-01-05 by Kevin Feasel

Erik Darling shows an example of what happens when you have a filtered index and parameterize the filter:

It Is Known

That when you use filtered indexes, they get ignored when your queries are parameterized. This is a Plan Caching Thing©, of course. The simplest example is a bit column with a filtered index. If your index is on WHERE Bit = 1, it doesn’t have data for WHERE Bit = 0. That index would only be suitable for one variation of the query, so caching a plan that uses an index which can’t be reused for every variation isn’t feasible.

Read on for a couple examples, and check the comments on this as well.

Comments closed

Using RTVS

Published 2017-01-05 by Kevin Feasel

David Eldersveld gives three reasons why you might be interested in R Tools for Visual Studio:

2. Incorporate R projects as part of a broader Visual Studio solution
Many Visual Studio solutions end up being a collection of individual projects. More often than not, these projects are logically joined by virtue of being part of the same business solution, but each one can incorporate different components or languages. For example, you may architect a solution that involves separate projects for loading data with Azure Data Factory, analysis with R, a front-end C# web app, etc. Rather than keep your R code siloed off in a separate solution, unite it with the rest of your code for development and source control.

This is my primary reason. R Studio is still my go-to option, but RTVS is maturing fairly nicely. It still feels slower than R Studio when displaying data on-screen (especially when you’re spitting out a couple hundred lines of text), but that Visual Studio integration will go far. A fourth reason that David does not mention: it generates the really ugly sp_execute_external_script code for SQL Server R Services.

Comments closed

Figuring Out Work-Life Balance

Published 2017-01-05 by Kevin Feasel

Sander Stad talks work-life balance:

The thing that comes up when I read this is that in most situations it will not work because you’re removing all the flexibility. The other thing is that most companies evaluate employees based on their availability and their flexibility.

There also a side note that employers are allowed to make different arrangements with employees.
Employers will probably adjust contracts from this point on that, if you’re in some sort of position where the availability is important, you’re obliged to answer which will render the law useless in a lot of situations.

There’s no one answer here, but it’s an important topic to think about.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel

It Is Known