Concepts in Sympathy for Data

Workflows

Workflow is the common name for the visual data analysis processes that are constructed in Sympathy for Data. In general, the visual workflows consist of a number of visual building blocks which are connected graphically with wires. The building blocks in Sympathy for Data are called nodes and are visual shells connected to underlying Python code that defines the functionality of the node. It is only the nodes in the workflows that perform operations on the actual data. The graphical wires represent the “transportation” of data between the nodes.

A workflow can be saved to a file, which by default will have the extension .syx. The syx-files includes the graphical structure of both the workflows and any subflows as well as all the parameter settings for each node. To save a workflow click Save or Save as... in either the toolbar or in the File menu.

In Sympathy data always flows from left to right. This means that the right-most node is also the “last” node in the workflow. By double-clicking on the last node, you will start execution of any nodes to the left of that node. This might be used to execute an entire workflow (or at least everything that is connected to that node). Another way to execute an entire workflow is to simply push the “Execute” button in the toolbar.

Apart from nodes, you can also place textfields in the workflow. This is useful if you want to add a comment or description to your workflow. These text fields become a part of the workflow and are saved together with all other elements in the workflow file. To create a textfield click the button named “Insert text field” in the toolbar, then draw a rectangle on the workspace. An empty text field will appear, and by clicking in it you will be able to add some text.

Nodes

The nodes in Sympathy can be added to the workflow from the node library window, where the nodes are categorized by their functionality. Simply grab a node and drop it on the workspace.

The name of a node is located below the node. You can edit the name of a node simply by clicking on its current name. This can be used as a documentation tool to make your workflow easier to understand.

Double-clicking on a node will execute it. If other nodes need to run first your node will be queued while waiting for the other nodes. When a node is queued or executing you can right-click on it and choose Abort if you want to cancel the execution. If a node has already been executed and you want to run it again, the first thing you have to do is to reload the node, by right-clicking on it and choosing Reload. After that you can run it again.

Many nodes can be configured to perform their task in different ways. Right clicking on a node and choosing Configure will bring up the configuration GUI for that node. Some nodes have very simple configuration GUIs whereas other nodes have very complex configuration GUIs. You can read the help texts for any specific node by right clicking on a node and choosing help.

Node states

The color of the background indicates the state of the node and in the table below the different states are presented together with their corresponding colors.

State Color State icon Explanation
Armed Beige None The node is ready for execution.
Error Red Warning triangle An error occurred during the last execution of the node.
Invalid Light gray Wrench The node’s configuration is invalid or an input port has not been connected.
Done Green Check mark Successfully executed.
Queued Blueish gray Analog clock The node is queued for execution.
Sympathy node states.

A sample of nodes in different states. The first row of nodes have not yet been executed, but while the Random Table node can the executed right now, the Datasource node requires some kind of configuration before it can be executed. The second row of nodes are being executed right now. The node to the left (Example1) is currently executing and Example2 is queued and will be executed as soon as Example1 is done. The nodes in the final row have both been executed, but while the Hello world Example node was executed successfully the Error Test node encountered an error during execution (as it is designed to do).

Ports

On the sides of the nodes are small symbols representing the node’s ports for incoming and outgoing data. Since the workflows are directed from left to right, the inputs are located on the left side and the outputs are on the right side.

The ports can have different symbols representing different data types. It is only possible to connect an output port with an input port of the same type. The type system in Sympathy thus ensures that only compatible nodes can be connected.

The connections are represented by wires between the nodes and are established by drag and drop. Click on an output port and drag to an input port on another node or vice versa. The nodes can be disconnected by right clicking the wire and choosing Delete or by selecting the connection and pressing Delete on your keyboard.

No real data is transferred between the nodes, instead paths to temporary files are exchanged. It is these temporary files on the disk that contain the actual data. Double clicking on an output port will open the data on that port in an internal data viewer.

Data types

The four different port types that are currently supported in Sympathy are Datasource, Table, ADAF, and Text. Apart from these any port symbol can also be enclosed in brackets, representing that the port handles a list of arbitrary length of the corresponding data type.

Input and output ports.

A sample of nodes to show the different types of input and output ports for the nodes in Sympathy for Data. The upper row of nodes all have single item ports whereas the nodes in the bottom row have list ports. This can be seen by the fact that those ports are enclosed by square brackets. From left to right the type of the output ports are Datasource, Table, ADAF, and Text respectively.

Datasource

The Datasource format is only used as a pointer to files or to a databases. It is often used at the start of a workflow to pinpoint the data that the workflow will be working with.

See also the nodes Datasource and Datasources.

Table

Table is the most common data type in data analysis. Tables are typically found in CSV-files (comma separated values), Excel-files and databases. Even matrices and vectors are, in some sense, tables. Most computations map very naturally to tables. A table in Sympathy is much like a database table - a collection of columns that each have a name and contains a single kind of data (numbers, strings, dates etc.). Ports which accept or output data with the Table type are represented by a gray square.

ADAF

ADAF is the data analysis format used in Sympathy when working with more complicated data. The strength of this format is that it enables the user to work with meta data (data about the data content), results (aggregated/calculated data) and time series (measured data) together, making advanced analysis possible in a structured way. Ports which accept or output data with the ADAF type are represented by a gray “steering wheel”.

See also Working with ADAF.

Text

The Text data type allows you to work with arbitrary text strings in Sympathy. Ports which accept or output data with the Text type are represented by a number of horizontal lines.

Lists

Lists make it possible to handle multiple data together in a flow. It is the most pure way to implement looping constructs in a platform like Sympathy. A good example of when lists are useful is when there are a lot of files on a disk with test data and the user wants to select all the files and analyze them in a single workflow.

Generic types

Generic types are types that can change, depending on what you connect them to This is especially useful for list operations that can be performed independently of the types of the elements in the list. Examples: ‘Item to List’ and ‘Get Item List’.

Currently, the generic types are visualized by a question mark on the port, to see the actual type you need to hover over the port for a while for the tooltip containing a textual representation of the actual type to appear.

Function types (Lambda function)

Function is a new datatype that represents a function that can be executed. The type is shown as a question mark on the port, in the same way that generic types are shown. The corresponding tooltip when hovering, will show something like: ‘table -> table’, ‘a -> a’, ‘a -> b -> b’. This representation can be interpreted in the following way: the rightmost type is the result type, every other type is an argument, starting with the leftmost one for the first argument.

Control structures

Perhaps you have noticed that some common programming control structures are missing in Sympathy. Things like loops and if-statements are instead implemented in a more data-centric way.

Conditional execution

There is currently no way to branch a flow and only execute a single branch. Instead you can use filters and selectors to guide the data into different branches.

Looping

There is also no explicit way to loop in Sympathy. What you can do though is to use Lists. Most list nodes implicitly loop over all the incoming data. For example Select columns in Tables will loop over all the tables in the input and do the selection for each of them.