The functional idea behind a bandit algorithm is that you make an informed decision every time you assign a visitor to a test arm. Several bandit-type algorithms have been proved to be mathematically optimal; that is, they obtain the maximum future revenue given the data they have at any given point. Gittins indexing is perhaps the foremost of these algorithms. However, the trade-off of these methods is that they tend to be very computationally intensive.
This article doesn’t show any code, but it is useful for thinking about the problem.
Programming is one of the five main competence areas at the base of the skill set for a Data Scientist, even if is not the most relevant in term of expertise (see What is the right mix of competences for Data Scientists?). Considering the results of the survey, that involved more than 200 Data Scientist worldwide until today, there isn’t a prevailing choice among the programming languages used during the data science’s activities. However, the choice appears to be addressed mainly to a limited set of alternatives: almost 96% of respondents affirm to use at least one of R, SQL or Python.
These results don’t surprise me much. R has slightly more traction than Python, but the percentage of people using both is likely to increase. SQL, meanwhile, is vital for getting data, and as we’re seeing in the Hadoop space, as data platform products get more mature, they tend to gravitate toward a SQL or SQL-like language. Cf. Hive, Spark SQL, Phoenix, etc.
Despite all the more modern machine learning algorithms, a good old single decision tree can still be useful. Moreover, in a business analytics context they can still keep up in predictive power. In the last few months I have created different predictive response and churn models. I usually just try different learners, logistic regression models, single trees, boosted trees, several neural nets, random forests. In my experience a single decision tree is usually ‘not bad’, often only slightly less predictive power than the more fancy algorithms.
An important thing in analytics is that you can ‘sell‘ your predictive model to the business. A single decision tree is a good way to to do just that, and with an interactive decision tree (created by Microsoft R) this becomes even more easy.
I’d like the labels in Longhow’s tree to be a little clearer, but I do like this from the perspective of giving end users something to experience.
Finding where to create hierarchies is the hardest part of creating them in Power BI, especially if one has ever created hierarchies in Excel Power Pivot as they are not it the same place. Hierarchies are not in the Relationships data view, instead they are found in the Report view. Right clicking on the ellipse next to any field in a table displays a menu, and the second item on the menu is New hierarchy. Hierarchies can also be created by clicking and dragging a field on top of another field, which also will create a hierarchy. Once the hierarchy has been created, to add another field to the hierarchy, drag a new value on top of the value with the hierarchy icon. If the value added is not added to the location you want it, click on the ellipse next to the field named and move the field up or down as you wish.
Ginger also shows how to create drillthrough reports once you have hierarchies in place.
If you’re not a frequent SQL writer, the syntax can indeed be confusing. Especially
GROUP BYand aggregations “infect” the rest of the entire
SELECTclause, and things get really weird. When confronted with this weirdness, we have two options:
- Get mad and scream at the SQL language designers
- Accept our fate, close our eyes, forget about the snytax and remember the logicaloperations order
I generally recommend the latter, because then things start making a lot more sense, including the beautiful cumulative daily revenue calculation below, which nests the daily revenue (
SUM(amount)aggregate function) inside of the cumulative revenue (
SUM(...) OVER (...)window function):
Lukas explains things from an Oracle perspective, so not all of this matches T-SQL, but it’s close enough for comparison.
A Sch-S (schema stability) lock is taken. This is a lightweight lock; the only lock that can conflict with this is a Sch-m (schema modification) lock. (C = Conflict). This means that a NOLOCK can actually block for example against an ALTER TABLE command.
I would lean heavily toward turning on Read Committed Snapshot Isolation instead of using NOLOCK in most environments. It’s something you’d need to test, but it does come with fewer bad ramifications.
Ever had users come to you and request another version of a report just to add another field and group data differently? Today, was such the day for me. I really don’t like have multiple versions of the same report out there. So, I got a little fancy with the current version of the report and added a parameter then used expressions to group the data differently and hide columns. For those new to SSRS I’ve embedded some links to MSDN to help you along the way.
This is an easy-to-follow, step-by-step guide.
With that said, it would seem that asynchronous statistics are better suited to OLTP environments and synchronous statistics are better suited to OLAP environments. As synchronous statistics are the default though, people are reluctant to change this setting without good reason to. I’ve worked exclusively in OLTP environments over the last few years and have never seen asynchronous statistics rolled out as the default. I have personally been bitten by synchronous statistic updates on large tables causing query timeouts which I resolved by switching to asynchronous statistic updates.
This is an interesting, nuanced take on the issue. My bias is toward asynchronous stats updates because I have been burned, but it’s interesting to read someone thinking through the implications of this seemingly simple choice.
It looks like, eventually, this will be the way that any type of ‘native’ query (ie a query that you write and give to Power Query, rather than a query that is generated for you) is run against any kind of data source – instead of the situation we have today where different M functions are needed to run queries against different types of data source. I guess at some point the UI will be updated to use this function. I don’t think it’s ‘finished’ yet either, because it doesn’t work on Analysis Services data sources, although it may work with other relational data sources – I haven’t tested it on anything other than SQL Server and SSAS. There’s also a fourth parameter for Value.NativeQuery() that can be used to pass data source specific options, but I have no idea what these could be and I don’t think there are any supported for SQL Server. It will be interesting to see how it develops over the next few releases.
It’s good to know that you can parameterize queries now.
With the recent announcement of SQL Server 2016 SP1, we announced the consistent programmability experience for developers and ISVs, who can now maintain a single code base and build intelligent database applications which scale across all the editions of SQL Server. The processor, memory and database size limits does not change and remain as–in all editions as documented in the SQL Server editions page. We have made the following changes in our documentation to accurately reflect the memory limits on lower editions of SQL Server. This blog post is intended to clarify and provide more information on the memory limits starting with SQL Server 2016 SP1 on Standard, Web and Express Editions of SQL Server.
The development space has been expanded, but there’s still good reason for enterprises to use Enterprise Edition.