Skip to content

A guide to Alteryx R Tool: configuring input

We have found that even if you have a deep understanding of R, the interface between Alteryx and R comes with its own challenges. To start overcoming these, we have put together a series of articles, this first one will look at how to configure input for the R Tool.

Published by Concentra

Discover analytics solutions from Concentra


Concentra’s analytics and business intelligence teams turn information into insight to give you the edge from your data. Learn more.

(This article was first published on October 19th 2015 and was updated on July 11th 2019 to reflect current practices.)

While Alteryx provides a wide range of out of the box predictive analytics, savvy users can stretch this boundary using the R Tool for custom analysis. R is a statistical computing language, which powers Alteryx’s built-in predictive analytics. To build your own predictive tool you can download Atleryx’s predictive analytics package and use the R Tool (found in the Developer Toolbox) to author R code.

We have found that even if you have a deep understanding of R, the interface between Alteryx and R comes with its own challenges. To start overcoming these, we have put together a series of articles, this first one will look at how to configure input for the R Tool.

This article expects familiarity with coding terms like assignment statements, conditional statements, loops, and objects, although not necessarily in the R context. When we reference an R-specific function or object, we will provide a link it to helpful materials for the non-R-programmer.

The R Tool is found in the Developer tab of the Alteryx toolbox. If you add this tool to an existing workflow and double-click it to open the Configuration pane, you will be greeted by a very blank (and for the non-R developer, very daunting) text box. Where do we start?

Alteryx R Configuration Screen

Fortunately, Alteryx provides some snippets of code that you can add using the “Insert Code” drop down menu (top left).These snippets will help you input and output data, charts, and error messages.

The R Tool can accept many input datasets. Let’s start with handling a single dataset first. The R Tool numbers the datasets by default, so the first dataset you connect to the tool will be called “#1.” Want to change this? Rename the connector by double-clicking on it. In its Configuration pane, type your choice into the top input text box called “Name.”

1.The simplest way to input data 

1) Drop down “Insert Code”

2) Choose “Read Input: #1”. Note that if you renamed your input, this will say “Read Input: [My new label]”. 

3) Choose “As Data Frame”. 

Result? You will get your first line of R code: 

read.Alteryx(“#1″, mode=”data.frame”)

This is excellent, except for the fact that it doesn’t actually save your data for later use! To actually make use of the data, you will need to supplement this line with an assignment statement: 

df <- read.Alteryx(“#1″, mode=”data.frame”)

Now you have asked the R tool to save your data as df and store it within the R script. In our experience, this is the easiest way to use data in R, as it is exactly how you see your data in Alteryx – rows and columns. Now, you can use R functions applied to df.

2. What about the other input options? 

1) “Read Input: #1” + “As Data Frame: Chunked”:
If you are using large datasets, this provides a while-loop that will input the data in chunks: 

data <- read.Alteryx.First(“#1″, 50000, mode=”data.frame”)

while (!is.null(a))
{
 #write.Alteryx(a, 2)
              data <- read.Alteryx.Next(“#1″, mode=”data.frame”)
}

If you try running this, it will break! Why? Because the code doesn’t know how many chunks of data you will have. Alteryx has added !is.null(a) as a generic stopping rule. You will need to set up looping logic of your own. The following is an example that tests to see if enough records remain to create a full batch and then stops the process when records run out:

data <- read.Alteryx.First(“#1″, 1000, mode=”data.frame”)

a = 0
while (a == 0)
{
              data2 <- read.Alteryx.Next(“#1″ , mode=”data.frame”)
              rows <- nrow(data2)
              if(is.null(rows)){
                        a = 1}

else {
a = 0}

}

A basic example of how you would read a data stream in chunks. Data is taken from the first 1000 rows, and then takes the next 1000 rows iteratively until there are no more rows to read.

2) “Read Input: #1”+ “As List”:

This is just another way of storing and referencing the data, but is used when dealing with Spatial Objects.

3) “Read Input: #1”+ “As List: Chunked”:

Again, identical to the “Data Frame: Chunked” method, but used when dealing with Spatial Objects.

4) “Read Input: #1”+ “Input Metainfo”:

This method directly reads the metadata about your data frame, with the resulting information being exactly same as if you look at the “Metadata”  pane in results window.

data <- read.AlteryxMetaInfo(“#1”)

write.Alteryx(data, 1)

The bottom line is that the best way to handle data in R is to use data frames. We can read now Alteryx data into the R Tool in this format.

3. Using your data

1) Read in data

Now we can read in data, we can apply some of the tools at R’s disposable to gain insight. In this example, we will be using some data contained within the Predictive Tools suite. It can be found if you use the “Help” tab and navigate to:

Help > Sample Workflows > Predictive tool samples > Predictive Analytics > 5 Plot of Means You can then copy the text input tool and paste it into a new workflow. Then, drag on the R tool so your workflow can end up looking like this:

Workflow in Alteryx R

2) Load a package

Alteryx R load package in configuration screen

Installing extra packages can be slightly tricky in the R tool but luckily there are lots of useful packages already pre-installed in Alteryx. With the packages pre-installed, loading them is as easy as it is in normal R.

Now we have our data and our package, we can perform operations.

3) Finding correlations

Now, let’s say we wanted to find the correlation for the Age, Number of Loans and number of dependents. First we need to select the columns of interest, and apply a correlation function.

correlation_cols = select(df, Age, Num_Loans, Dependents)

cor_matrix = cor(correlation_cols)

This gives us the correlation matrix for our age, number of loans and number of dependents for this dataset. However, Alteryx only works with data frames in the data stream. This means any results you want to write out use in a different part of the workflow will have to be converted to data frames before moving forward.

corr_df = data.frame(cor_matrix)

This allows us to write our results out of the R tool and into new tools. We can do this using the write command.

write.Alteryx(corr_df, 1)

Alteryx R results screen

And it’s as easy as that to use R right within your Alteryx workflow, gaining new insights from your data.

*Header image of this blog is credited to Alteryx

Discover analytics solutions and services

Concentra’s analytics and business intelligence teams turn information into insight to give you the edge from your data. Learn more about Concentra’s analytics solutions and services.