I’m an R programmer. To me, R has been great for data exploration, transformation, statistical modeling, and visualizations. However, there is a huge community of Data Scientists and Analysts who turn to Python for these tasks. Moreover, both R and Python experts exist in most analytics organizations, and it is important for both languages to coexist.
Many times, this means that R coders will develop a workflow in R but then must redesign and recode it in Python for their production systems. If the coder is lucky, this is easy, and the R model can be exported as a serialized object and read into Python. There are packages that do this, such as pmml. Unfortunately, many times, this is more challenging because the production system might demand that the entire end to end workflow is built exclusively in Python. That’s sometimes tough because there are aspects of statistical model building in R which are more intuitive than Python.
Python has many strengths, such as its robust data structures such as Dictionaries, compatibility with Deep Learning and Spark, and its ability to be a multipurpose language. However, many scenarios in enterprise analytics require people to go back to basic statistics and Machine Learning, which the classic Data Science packages in Python are not as intuitive as R for. The key difference is that many statistical methods are built into R natively. As a result, there is a gap for when R users must build workflows in Python. To try to bridge this gap, this post will discuss a relatively new package developed by Microsoft, revoscalepy.
Having worked with both, my loyalties tend to lie with R for a couple of reasons. But this might help some people bridge the gap.
Pro Tip: A quick test is to see if the log transformation increases the magnitude of the correlation between “TotalCharges” and “Churn”. We’ll use a few
dplyroperations along with the
corrrpackage to perform a quick correlation.
correlate(): Performs tidy correlations on numeric data
focus(): Similar to
select(). Takes columns and focuses on only the rows/columns of importance.
fashion(): Makes the formatting aesthetically easier to read.
This is a very useful tutorial.
Apart from catching up on news during my commute I only really use notifications for a certain number of hashtags i.e. #SqlServer, #tsql2sday, #sqlhelp, and #PowerShell.
So during work, every so often a little notification will pop up on the bottom right of my window and I can quickly glance down and decide whether to ignore it or check it out.
That’s what happened with the following tweet:
Click through for Shane’s demo.
Sometimes when writing an ad hoc query you might want to take the results of one query and put them into an IN() statement of another query.
Sure, you can write a subquery to put into your IN() statement…but that’s too much work for a one-time use disposable query.
What you can do instead is:
Copy your values of interest
Paste them into your IN() statement
Hold down the ALT key while dragging the mouse down in front of all of your pasted values
Type a comma (see video above for an easier demonstration).
For SSMS speedrunning strats, you can also hold down ALT + SHIFT and use your keyboard arrow keys instead of using the mouse.
Just like in the root page and the intermediate pages, the FirstName and RowID columns are present.
Also in the leaf: CharCol, our included column appears! It was not in any of the other levels we inspected, because included columns only exist in the leaf of a nonclustered index.
Kendra does a great job of explaining the topic.
ASP.NET session state enables you to store and retrieve values for a user as the user navigates the different ASP.NET pages that make up a Web application. Currently, ASP.NET ships with three session state providers that provide the interface between Microsoft ASP.NET’s session state module and session state data sources:
- InProcSessionStateStore, which stores session state in memory in the ASP.NET worker process
- OutOfProcSessionStateStore, which stores session state in memory in an external state server process
- SqlSessionStateStore, which stores session state in Microsoft SQL Server database
This blog post focuses on the SqlSessionStateStore provider and describes how you can configure it to use SQL Server In-Memory OLTP as the storage option for session data. You can either use the latest ASP.NET async version of the SQL Session State provider (which is the recommended approach), or configure an earlier version of the provider to work with In-Memory OLTP by downloading and running the In-Memory OLTP SQL scripts from our sql server samples github repo.
The me of seven years ago really needed this. But with the strong shift against session-based data collection and back to stateless or client-held state paradigms, I’m not sure how many people this helps.
I’m excited to share the news with you that we have added a new feature in Power BI Helper; Expression Tree. Expression Tree will expand the tree of expression for a Measure or calculated column, you can see what other measures are used to create this expression, and where other measures, calculated columns, or even normal columns are located (in which table). This feature is in addition to previous two features of this tool which were; Showing tables and fields used in visualization pages of a Power BI Report, and ability to search for a column or table that used in visualization pages of a report. In this post, I’ll explain how this new feature works.
Read on for the explanation. I can see this being quite useful.
No matter how hard the dbatools; team tries, there’s always someone who wants to do things we’d never thought. This is one of the great things with getting feedback direct from a great community. Unfortunately a lot of these ideas are either too niche to implement, or would be a lot of complex code for a single use case.
As part of the
Restore-DbaDatabasestack rewrite, I wanted to do make things easier for users to be able to get their hands dirty within the Restore stack. Not necessarily needing to dive into the core code and the world of GitHub Pull Requests, but by manipulating the data flowing through the pipeline using standard PowerShell techniques, all the while being able to do the heavy lifting without code.
Click through for several examples.
So hooray! We have found word vectors again, a bit faster, with clearer and easier-to-understand code. I do argue that this is a real benefit of this approach; it’s based on counting, dividing, and matrix decomposition and is thus much easier to understand and implement than anything with a neural network. And the results?
Click through to see the new method, as well as some basic analogy testing.
Data preparation for machine learning requires business domain expertise, bias awareness and an experimental thought process. Before preparing your data, you’ll first define a business problem solve. During that exercise, you’ll select an outcome metric and brainstorm potential input variables that influence it from many varied perspectives. From there you will begin identifying, collecting, cleaning, shaping and sampling data to run through automated machine learning model processes.
Note that it is also not unusual for relevant machine learning input data to occur outside of existing transactional processes. If that is the case, you can still start creating a first-generation machine learning model with existing data and continue to build new model versions over time as supplementary data is acquired.
Click through for the ten tips.