19 Oct 2015

A guide to Alteryx R Tool: configuring input

Concentra Team

To find out more about Concentra's bespoke Analytics Services and Solutions see Services.

While Alteryx provides a wide range of out of the box predictive analytics, savvy users can stretch this boundary using the R Tool for custom analysis. R is a statistical computing language, which powers Alteryx's built-in predictive analytics. To build your own predictive tool, you can use the R Tool (found in the Developer Toolbox) to author R code. 

We have found that even if you have a deep understanding of R, the interface between Alteryx and R comes with its own challenges. To start overcoming these, we have put together a series of blogs, the first one will look at how to configure input for the R Tool. 

This post expects familiarity with coding terms like assignment statements, conditional statements, loops, and objects, although not necessarily in the R context. When I reference an R-specific function or object, I will provide a link it to helpful materials for the non-R-programmer.

The R Tool is found in the Developer tab of the Alteryx toolbox. If you add this tool to an existing workflow and double-click it to open the Configuration pane, you will be greeted by a very blank (and for the non-R developer, very daunting) text box. Where do I start?

 Fortunately, Alteryx provides some snippets of code that you can add using the "Insert Code"
drop down menu (top left).These snippets will help you input and output data, charts, and error messages.

The R Tool can accept many input datasets. Let's start with handling a single dataset first. The R Tool numbers the datasets by default, so the first dataset you connect to the tool will be called "#1." Want to change this? Rename the connector by double-clicking on it. In its Configuration pane, type your choice into the top input text box called "Name." 

The simplest way to input data 

1) Drop down "Insert Code"

2) Choose "Read Input: #1". Note that if you renamed your input, this will say "Read Input:[Whatever I chose as my new label]". 

3) Choose "As Data Frame". 

Result? You will get your first line of R code: 

read.Alteryx("#1", mode="data.frame")

 This is excellent, except for the fact that it doesn't actually save your data for later use! To actually make use of the data, you will need to supplement this line with an assignment statement: 

My.Dataset <- read.Alteryx("#1", mode="data.frame")

Now you have asked the Macro to save My.Dataset is a Data Frame object. In my experience, this is the easiest format of data to use in R, as it is exactly how you see your data in Alteryx – rows and columns. Now, you can use R functions applied to My.Dataset.  

What about the other input options? 

1) "Read Input: #1"+ "As List":
Instead of a Data Frame object, now you get a List object. This is just another way of storing and referencing the data. I find it more cumbersome, probably because I use it less frequently. Same issue applies here; you will need to add your assignment statement to save the data. 

2) "Read Input: #1" + "As Data Frame: Chunked":
If you are using large datasets, this provides a while-loop that will input the data in chunks: 

data <- read.Alteryx.First("#1", 50000, mode="data.frame")
while (!is.null(a))
{
#write.Alteryx(a, 2) - a commented out output phrase
data <- read.Alteryx.Next("#1", mode="data.frame")
}

If you try running this, it will break! Why? Because the code doesn't know how many chunks of data you will have. Alteryx has added !is.null(a) as a generic stopping rule. You will need to set up looping logic of your own. The following is an example that tests to see if enough records remain to create a full batch and then stops the process when records run out:

data <- read.Alteryx.First("#1", 50000, mode="data.frame")
a <- 1
while (!is.null(a))
{
data <- read.Alteryx.Next("#1", mode="data.frame")
dim.data = dim(data) # Get the dimension of the data
num.rows = dim.data[1] # Get the number of rows
if(num.rows < 1000) a = NULL # If the Chunk is not large
# enough, set a to NULL (this will trigger the end of
# loop)
}

The bottom line is that the best way to handle data in R is to use data frames. We can read now Alteryx data into the R Tool in this format. 

 

*Header image of this blog is credited to Alteryx

Concentra Team

About the author

Our team consists of consultants, analysts, and developers with in-depth expertise in various industries. Driven by the desire to innovate and disrupt current ways of working, we are passionate in helping organisations transform through the use of intuitive data analytics, management and visualisation.