Python-Tableau Integration

Tableau released the beta version of TabPy back in December 2016 which enables the evaluation of Python code from within a Tableau Workbook. Thus, we can leverage the power of large number of Machine Learning libraries to generate and visualize the predictions in Tableau.

TabPy runs over Anaconda Enviroment. Hence, we can use any Python Libraries in our scripts such as scipy, scikit-learn, keras, tensorflow etc.

In a nutshell, you can create a calculation that will contain a Python script. Initial simple setup is required to point Tableau to the Python instance, and then when the view is rendered in Tableau the script will be passed to Python and the respective returned data is displayed in Tableau.

Prerequisites for setting up TabPy —

  1. Windows/Mac/Linux system.
  2. Tableau Desktop 10.1 (Windowd/Mac)
  3. Python v2.6 or above
  4. Tableau-Python server (TabPy)

Steps —

  • Go to TabPy repository on GitHub by Tableau from Tableau Training
  • Click on clone or download button on upper right corner.
  • Download the ZIP and extract it.
  • Run setup.bat if you are using Windows and setup.sh if you are using Linux or Mac.

Now sit back and relax as the command prompt/terminal will download and install Anaconda environment alongwith creating Tableau-Python server.

  • Once installed, you’ll get the following message. Here, TabPy has started running on localhost and is listening to port 9004.

You can also start the TabPy later by going to the respective anaconda installation directory and running the startup.bat file.

Now, we need to configure Tableau to connect to the TabPy server.

  • Go to Tableau Desktop > Help > Settings and Performance > Manage External Service Connection. Enter the server name and port number where your Tableau Server is running. Click OK. Learn more skills from Tableau Server Training
  • A success message will be displayed and your Tableau is now connected to TabPy Server.

Using TabPy to Run Python in Tableau —

Following are the steps to run a basic python script in Tableau.

  • Import your data to Tableau (We will be using IRIS Dataset in our example).
  • Create a calculated field.
  • For now let us create a Naive Bayes model from the input data and predict the same data using the fitted model. Write the following code in the calculated field.
SCRIPT_REAL(“
import numpy as np
from sklearn.naive_bayes import GaussianNB
 
# create the model
model = GaussianNB()
 
# transform input data 
data_x = np.transpose(np.array([_arg1, _arg2, _arg3, _arg4]))
data_y = np.array(_arg5)
 
# fit the model
model.fit(data_x, data_y)
 
# predict the category for input data
predicted_category = model.predict(data_x)
 
# transform output
return list(np.round(predicted_category, decimals=2))
“, ATTR([Petal Length]), 
 ATTR([Petal Width]), 
 ATTR([Sepal Length]), 
 ATTR([Sepal Width]), 
 ATTR([Category]))

_arg defines the individual input arguments (columns in the original data). In this example all of the input arguments are vectors. We have to use the “ATTR()”, because SCRIPT_XX requires some sort of aggregation function although we are not working with aggregated data. Also for a call to Python to be successful, the script requires the return argument.

To visualize the output, we will compare the original categories with the predicted categories from the model.

Source : https://blog.alookanalytics.com/2017/02/14/advanced-analytics-with-python-and-tableau/

However, this method is only for testing/playing around, while for production use you should use deployed functions as mentioned in the Tableau client documentation and define them as endpoints.

Once deployed, all it takes to run a machine-learning model is a single line of Python code in Tableau regardless of model type or complexity. You can estimate the probability of customer churn using logistic regression, multi-layer perceptron neural network, or gradient boosted trees just as easily by simply passing new data to the model.

Using published models has several benefits. Complex functions become easier to maintain, share, and reuse as deployed methods in the predictive-service environment. You can improve and update the model and code behind the endpoint while the calculated field keeps working without any change. And a dashboard author does not need to know or worry about the complexities of the model behind this endpoint.

There are plethora of other capabilities. Tableau can also be connected to additional data sources and can create real-time dashboards that are constantly updated.

To get in-depth knowledge, enroll for a live free demo on Tableau Online Training

Leave a comment

Design a site like this with WordPress.com
Get started