Microsoft Graph is a comprehensive framework for accessing data in various online Microsoft services, including Azure Active Directory (AAD), Office 365, OneDrive, Teams, and more. AzureGraph is an R package that provides a simple R6-based interface to the Graph REST API, and is the companion package to AzureRMR and AzureAuth.
Currently, AzureGraph aims to provide an R interface only to the AAD part, with a view to supporting R interoperability with Azure: registered apps and service principals, users and groups. Like AzureRMR, it could potentially be extended to support other services.
Just to clarify, this is like Facebook Graph API for Azure components, not a graph database that you can store your own data in.
This may seem like a lot of steps, but it is only because we are taking the problems very slowly. The important point is that we want to minimize additional problem solving when applying the
cdatamethodology. Usually when you need to transform data you are in the middle of some other more important task, so you want to delegate the details of how the layout transform is implemented. With
cdatathe user is not asked to perform additional puzzle solving to guess a sequence of operators that may implement the desired data layout transform. The
cdatasolution pattern is always the same, which can help in mastering it.
cdata, record layout transforms are simple
Robjects with detailed
print()methods- so they are convenient to alter, save, and re-use later. The record layout transform also documents the expected columns and constants of the incoming data.
Check it out.
Boxplots for each quantitative variables are shown. We take advantage of the quantitative variable names (quantitative_vars) determined before to apply a ggplot2 package based boxplot function. The Y axis labeling and title are determined by the variable to be plot. Further, legend is not displayed and we adopt the coordinate flip option for improved readability.
Check it out to get an idea of how to do exploratory data analysis.
Naive Bayes is a Supervised Machine Learning algorithm based on the Bayes Theorem that is used to solve classification problems by following a probabilistic approach. It is based on the idea that the predictor variables in a Machine Learning model are independent of each other. Meaning that the outcome of a model depends on a set of independent variables that have nothing to do with each other.
Naive Bayes is one of the simplest algorithms available and yet it works pretty well most of the time. It’s almost never the best solution but it’s typically good enough to give you an idea of whether you can get a job done.
There is always a discussion on how to store back the data from Power BI to local computer or SQL Server Databases, in this short blog, I will show how to do it by writing R scripts inside Power Query.
Leila also describes a complication you may hit where writes happen twice.
In order to build the html table I have used a function
table_framewhich can be used as a container in
DT::renderdatatable. This function basically uses
htmltools. For more references on the basics of html tables please refer here
In the last post, we focused on high-level aggregates to gain a basic understanding of our data. We saw some suspicious results but couldn’t say much more than “This looks weird” due to our level of aggregation. In this post, I want to dig into data at a lower level of detail. My working conception is the cohort, a broad-based comparison of data sliced by some business-relevant or analysis-relevant component.
Those familiar with Kimball-style data warehousing already understand where I’m going with this. In the basic analysis, we essentially look at fact data with a little bit of disaggregation, such as looking at data by year. In this analysis, we introduce dimensions (sort of) and slice our data by dimensions.
Click through for some fraud-finding fun.
You will learn what key principles a tidy data set adheres to, why it is useful to follow them consequently, and how to clean the data you are given. Tidying is also a great way to get to know a new data set.
Finally, in this tutorial you will learn how to write a function that makes your analysis look much cleaner and allows you to execute repetitive elements in your analysis in a very reproducible way. The function will allow you to load the latest version of the data dynamically into a flexible data scheme, which means that large parts of the code will not have to change when new data is added.
Check it out. Bonus point: tidy data is Boyce-Codd Normal Form which is (potentially) subsequently widened back out to include dimensional information.
Growth analysis focuses on changes in ratios over time. For example, you may plot annual revenue, cost, and net margin by year. Doing this gives you an idea of how the company is doing: if costs are flat but revenue increases, you can assume economies of scale or economies of scope are in play and that’s a great thing. If revenue is going up but costs are increasing faster, that’s not good for the company’s long-term outlook.
For our data set, I’m going to use the following SQL query to retrieve bus counts on the first day of each year. To make the problem easier, I add and remove buses on that day, so we don’t need to look at every day or perform complicated analyses.
I get into quite a bit in this post, including a quick tour of multicollinearity, which is only my second-favorite of the three linear regression amigos (heteroskedasticity being my favorite and autocorrelation the hanger-on).
Since 2011, Stack Overflow has been surveying their users each year to answer questions about the technologies they use, their work experience, their compensation, and their satisfaction at work. Given Stack Overflow’s place in the broader programming world, they are able to draw quite the audience for their annual surveys.
This year, nearly 90,000 developers participated in the survey! There’s a lot in this survey, and I recommend reviewing it yourself, but I wanted to surface some of the key findings that I thought were particularly relevant to data professionals here.
Stack Overflow says they will be releasing the underlying data for this survey in the coming weeks, so I hope to return to this for a deeper analysis once that’s made available. For now, let’s get into the results!
Michael’s lede involves R versus Python in terms of salaries, but for me, the top line is that functional programmers make more money. Clojure, F#, Scala, Elixir, and Erlang make the top 10 on the global list, including positions 1, 2, 4, and 5. Within the US, Scala, Clojure, Erlang, Kotlin, F#, and Elixir make the top 10, including positions 1, 2, and 4. H/T R-Bloggers