Press "Enter" to skip to content

Curated SQL Posts

Specifying A Database For Connection In SSMS

Denis Gobo shows how to specify a database when connecting to an instance using Management Studio:

One of our database on the development went in suspect mode today. This database was the default for a bunch of logins.  These people could not login now. Someone needed to use a different database but he couldn’t login because the database that was in suspect mode was the default database for the login he was using.

I told this person to click on the Options button in the connection dialog and specify another database. I guess there was an misunderstanding because this person couldn’t get it to work. This means it is time for a blog post.

Connecting to the default database is usually fine, but sometimes you need to specify one.  Fortunately, Management Studio makes it pretty easy.

Comments closed

Analysis Services Powershell

Aaron Nelson is advocating improvements to Powershell cmdlets around Analysis Services:

Frequently when developing updates to an SSAS cube I want to deploy my schema and process the dimension. Sometimes several of dimensions process successfully and then fails on one. At this point I go and correct the error, deploy the new schema, and then I only want to process all of my dimensions except the dimensions which did process successfully the first time. Sometimes this is really easy, but if you have a large number of dimensions this can become cumbersome since the only way to know which dimensions had been processed successfully or to right-click each dimension one at a time and find out, or to have memorized which dimensions had processed successfully on the earlier attempt. There can be a better way, and of course, PowerShell is one of those options. J The only problem is that as things currently stand, PowerShell is not as easy as it could be; the Invoke-ProcessDimension cmdlet doesn’t accept [direct] pipeline input. What is one to do when PowerShell isn’t as easy as it could be? File a Connect item of course!

Check out the Trello board.  It’s been instrumental in helping Microsoft developers get the leverage they need to dedicate time to improving particular aspects of the product.

Comments closed

Hortonworks Data Flow 2.1

Wei Wang and Haimo Liu announce Hortonworks Data Flow version 2.1:

In the release of HDF 2.1, data flow administrators within the enterprise can identify that in order for certain potential processors to be added to a working data flow system, additional authorization would be required.

In addition, HDF 2.1 supports over 180 processors including newly introduced Connect/Listen/PutWebSocket, Put/FetchElasticsearch5, ValidateCsv, etc.

HDF is Hortonworks’s big play on simplifying streaming operations in Hadoop.

Comments closed

SQL Server On Linux Service Commands

Andrew Peterson shows how to start, stop, and restart the SQL Server service on Linux:

Start Service

                 sudo systemctl start mssql-server

He also shows how to do a status check.  This is for distributions which use systemd, which includes the Red Hat distribution set (Fedora, CentOS, Red Hat Enterprise).  If you’re on Ubuntu, there’s no support quite yet, but you can use start and stop.

Comments closed

Thinking About Backups

Rob Farley has a set of questions you should ask yourself regarding your backups:

Does your disaster testing include a situation where a well-meaning person has taken an extra backup, potentially spoiling differential or log backups?

Does your disaster testing include random scenarios where your team needs to figure out what’s going on and what needs to happen to get everything back?

Something which might be helpful would be to catalog the reason why you restored a particular backup (or when somebody asks you for a backup but you can’t do it), and then have a plan to handle that scenario in the future.

Comments closed

Bandit Algorithms

Tanner Thompson describes usage of a multi-armed bandit algorithm to drive conversions:

The functional idea behind a bandit algorithm is that you make an informed decision every time you assign a visitor to a test arm. Several bandit-type algorithms have been proved to be mathematically optimal; that is, they obtain the maximum future revenue given the data they have at any given point. Gittins indexing is perhaps the foremost of these algorithms. However, the trade-off of these methods is that they tend to be very computationally intensive.

This article doesn’t show any code, but it is useful for thinking about the problem.

Comments closed

Data Science Languages

Alessandro Piva provides preliminary metrics on language usage among self-described data scientists:

Programming is one of the five main competence areas at the base of the skill set for a Data Scientist, even if is not the most relevant in term of expertise (see What is the right mix of competences for Data Scientists?). Considering the results of the survey, that involved more than 200 Data Scientist worldwide until today, there isn’t a prevailing choice among the programming languages used during the data science’s activities. However, the choice appears to be addressed mainly to a limited set of alternatives: almost 96% of respondents affirm to use at least one of R, SQL or Python.

These results don’t surprise me much.  R has slightly more traction than Python, but the percentage of people using both is likely to increase.  SQL, meanwhile, is vital for getting data, and as we’re seeing in the Hadoop space, as data platform products get more mature, they tend to gravitate toward a SQL or SQL-like language.  Cf. Hive, Spark SQL, Phoenix, etc.

Comments closed

Interactive Decision Trees

Longhow Lam describes the interactive decision tree in Microsoft R Server 9.0:

Despite all the more modern machine learning algorithms, a good old single decision tree can still be useful. Moreover, in a business analytics context they can still keep up in predictive power. In the last few months I have created different predictive response and churn models. I usually just try different learners, logistic regression models, single trees, boosted trees, several neural nets, random forests. In my experience a single decision tree is usually ‘not bad’, often only slightly less predictive power than the more fancy algorithms.

An important thing in analytics is that you can ‘sell‘ your predictive model to the business. A single decision tree is a good way to to do just that, and with an interactive decision tree (created by Microsoft R) this becomes even more easy.

I’d like the labels in Longhow’s tree to be a little clearer, but I do like this from the perspective of giving end users something to experience.

Comments closed

Power BI Drillthrough

Ginger Grant explains how to create and use hierarchies in Power BI:

Finding where to create hierarchies is the hardest part of creating them in Power BI, especially if one has ever created hierarchies in Excel Power Pivot as they are not it the same place. Hierarchies are not in the Relationships data view, instead they are found in the Report view. Right clicking on the ellipse next to any field in a table displays a menu, and the second item on the menu is New hierarchy. Hierarchies can also be created by clicking and dragging a field on top of another field, which also will create a hierarchy. Once the hierarchy has been created, to add another field to the hierarchy, drag a new value on top of the value with the hierarchy icon. If the value added is not added to the location you want it, click on the ellipse next to the field named and move the field up or down as you wish.

Ginger also shows how to create drillthrough reports once you have hierarchies in place.

Comments closed