This is logically equivalent to the first version of the code, but I find it makes for more readable code. It just looks cleaner.
For those of us who are
lazylooking to maximize efficiency, this could save a whole lot of key strokes.
This is true, but if you’re on SQL Server 2012 or later, check out CONCAT for concatenation, as it handles NULL values more elegantly.
Now since Power BI Custom Visualizations are not provided by Microsoft, they feel compelled to give you a warning message letting users know this. Here is the message box you get in Power BI Desktop when using a custom visualization. Notice that I clicked on the check box next to the text Don’t show this dialog again. As Words mean things, checking this box means the warning message never appears again. When you import the visualization into Power BI, no warning messages. Now I can use and propose custom visualizations to clients because they really are neat, and now they contain no warnings. Thanks so much to the Power BI Product team for fixing this major issue.
This is good news.
In every statistical analysis, the first thing one should do is try and visualise the data before any modeling. In microarray studies, a common visualisation is a heatmap of gene expression data.
In this post I simulate some gene expression data and visualise it using the
Rby Tal Galili. This package extends the plotly engine to heatmaps, allowing you to inspect certain values of the data matrix by hovering the mouse over a cell. You can also zoom into a region of the heatmap by drawing a rectangle over an area of your choice
This went way past my rudimentary heatmap skills, so it’s nice to see what an advanced user can do.
If you notice, in the above cmdlet the where-clause I’m selecting to use the Column1 property instead of a reasonable label. In my scenario the data in the CSV file contain variable columns fopr its different data types such as: Info, Error, and System. So, it was easy to identify the total number of columns to be 15 columns.
Now, using the cmdlet “Import-Csv” using the parameter “-Header”, you can define a list columns when you build the $Logdata object. We create the $header variable with the column-names separated by comma.
Keep an eye out for part 3. In the meantime, check out part 1 if you haven’t already.
I’ve made a few changes to the XQueryPlanPath project. The project parses query plans into xml and then using xpath to find the value of one or more nodes. This could then be used in testing to verify that any changes made to a query would retain a query plan that is considered optimal, and then if any changes break the test you can verify if the change causes sub-optimal effect on your query.
There was however one issue – query plans are like opinions; every SQL Server Instance has one, and none of them think that theirs stinks. So running a test on a dev box will potentially produce a different query plan from that on the build server, to that of production etc. This broadly because of 3 reasons:
Check it out, especially if your XML parsing skills aren’t top-notch.
Hadoop 3, as it currently stands (which is subject to change), won’t look significantly different from Hadoop 2, Ajisaka said. Made generally available in the fall of 2013, Hadoop 2 was a very big deal for the open source big data platform, as it introduced the YARN scheduler, which effectively decoupled the MapReduce processing framework from HDFS, and paved the way for other processing frameworks, such as Apache Spark, to process data on Hadoop simultaneously. That has been hugely successful for the entire Hadoop ecosystem.
It appears the list of new features in Hadoop 3 is slightly less ambitious than the Hadoop 2 undertaking. According to Ajisaka’s presentation, in addition to support for erasure coding and bug fixes, Hadoop 3 currently calls for new features like:
- shell script rewrite;
- task-level native optimization;
- the capability to derive heap size or MapReduce memory automatically;
- eliminating of old features;
- and support for more than two NameNodes.
The big benefit to erasure coding is that you can potentially cut data usage requirements in half, so that can help in very large environments. Alex also notes that the first non-beta version of Hadoop 3 is expected to release by the end of the year.
They didn’t give you parameter 26837 – I’m just giving you that so you can see an execution plan.
You don’t have to talk me through the query itself, or what you’d want to do to fix it. In fact, I want you to avoid that altogether.
Instead, tell me what things you need to know before you start tuning, and explain how you’re going to get them.
I think, based on the noise in the comments section, that this is a good question. Good interview questions are separating in equilibrium (as opposed to pooling). The question itself is straightforward, but people have such a tendency to jump the gun that they try to answer a question which isn’t being asked. Then, when reading the question, the set of steps and processes people have is interesting because of how much they differ.
Bonus question: take your interview answer (“I would do X and Y and Z and then A and B and C and maybe D.”) and apply it to the last time you had this scenario come up. How many of [A-DX-Z] did you actually do?
For those not familiar with the SQL Server Connector, it enables SQL Server to use Azure Key Vault as an Extensible Key Management (EKM) Provider for its SQL encryption keys. This means that you can use your own encryption keys and protect them in Azure Key Vault, a cloud-based external key management system which offers central key management, leverages hardware security modules (HSMs), and allows separation of management of keys and data, for additional security. This is available for the SQL encryption keys used in Transparent Data Encryption (TDE), Column Level Encryption (CLE), and Backup encryption.
When using these SQL encryption technologies, your data is encrypted with a symmetric key (called the database encryption key) stored in the database. Traditionally (without Azure Key Vault), a certificate that SQL Server manages would protect this data encryption key (DEK). With Azure Key Vault integration for SQL Server through the SQL Server Connector, you can protect the DEK with an asymmetric key that is stored in Azure Key Vault. This way, you can assume control over the key management, and have it be in a separate key management service outside of SQL Server.
Check it out, as it might be a solution to some key management issues.
The R data frame is a high level data structure which is equivalent to a table in database systems. It is highly useful to work with machine learning algorithms, and it’s very flexible and easy to use.
The standard definition of data frames are a “tightly coupled collections of variables which share many of the properties of matrices and of lists, used as the fundamental data structure by most of R‘s modeling software.”
Data frames are a powerful abstraction and make R a lot easier for database professionals than application developers who are used to thinking iteratively and with one object at a time.
The big reason that dimensional modeling increases clarity is that the dimensional model seeks to flatten data as much as possible. Let’s compare two examples. Both of these examples are for a fictional health clinic.
The first example is that we want a report on how many male patients were treated with electric shock therapy by provider, grouped monthly and spanning year to date range.
Those big Kimball-style warehouses do a great job of making it easier for people who are not database specialists to query data and get meaningful, consistent results to known business questions. The trick to understanding data platforms is that they tend to be complements rather than substitutes: introducing Spark-R in your environment does not replace your Kimball-style warehouse; it complements it by letting analysts find trends more easily. Similarly, a Hadoop cluster potentially lets you complement an existing data warehouse in a few ways: acting as a data aggregator (which allows you to push some ETL work off onto the cluster), a data collector (especially for information which is useful but doesn’t really fit in your conformed warehouse), and a data processor (particularly for those gigantic queries which are not time-sensitive).