Compare Datasets

The Compare Datasets node helps you compare data from two input streams.

Usage

  1. Decide which fields to compare. In Input A Field , enter the name of the field you want to use from input stream A. In Input B Field , enter the name of the field you want to use from input stream B.
  2. Optional : you can compare by multiple fields. Select Add Fields to Match to set up more comparisons.
  3. Choose how to handle differences between the datasets. In When There Are Differences , select one of the following:
    • Use Input A Version
    • Use Input B Version
    • Use a Mix of Versions
    • Include Both Versions

Understand item comparison

Item comparison is a two stage process:

  1. Mosaic Workflows checks if the values of the fields you selected to compare match across both inputs.
  2. If the fields to compare match, Mosaic Workflows then compares all fields within the items, to determine if the items are the same or different.

Options

You can use additional options to refine your comparison or modify comparison behavior.

Select Add Option, then choose the option you want to use.

Fields to Skip Comparing

Enter field names that you want to ignore.

For example, if you compare the two datasets below using person.language as the Fields to Match, Mosaic Workflows returns them as different. If you add person.name to Fields to Skip Comparing, Mosaic Workflows returns them as matching.

Copy
Copied
	// Input 1
	[
		{
			"person":
			{
				"name":	"Stefan",
				"language":	"de"
			}
		},
		{
			"person":
			{
				"name":	"Jim",
				"language":	"en"
			}
		},
		{
			"person":
			{
				"name":	"Hans",
				"language":	"de"
			}
		}
	]
	// Input 2
		[
		{
			"person":
			{
				"name":	"Sara",
				"language":	"de"
			}
		},
		{
			"person":
			{
				"name":	"Jane",
				"language":	"en"
			}
		},
		{
			"person":
			{
				"name":	"Harriet",
				"language":	"de"
			}
		}
	]

Fuzzy Compare

Whether to tolerate type differences when comparing fields (enabled), or not (disabled, default). For example, when you enable this, Mosaic Workflows treats "3" and 3 as the same.

Disable Dot Notation

Whether to disallow referencing child fields using parent.child in the field name (enabled), or allow it (disabled, default).

Multiple Matches

Choose how to handle duplicate data. The default is Include All Matches. You can choose Include First Match Only.

For example, given these two datasets:

Copy
Copied
	// Input 1
	[
		{
			"fruit": {
				"type": "apple",
				"color": "red"
			}
		},
				{
			"fruit": {
				"type": "apple",
				"color": "red"
			}
		},
				{
			"fruit": {
				"type": "banana",
				"color": "yellow"
			}
		}
	]
	// Input 2
	[
		{
			"fruit": {
				"type": "apple",
				"color": "red"
			}
		},
				{
			"fruit": {
				"type": "apple",
				"color": "red"
			}
		},
				{
			"fruit": {
				"type": "banana",
				"color": "yellow"
			}
		}
	]

Mosaic Workflows returns three items, in the Same Branch tab. The data is the same in both branches.

If you select Include First Match Only, Mosaic Workflows returns two items, in the Same Branch tab. The data is the same in both branches, but Mosaic Workflows only returns the first occurrence of the matching "apple" items.

Understand the output

There are four output options:

  • In A only Branch : data that occurs only in the first input.
  • Same Branch : data that's the same in both inputs.
  • Different Branch : data that's different between inputs.
  • In B only Branch : data that occurs only in the second output.

Related resources