join
Join data extracts from multiple datastreams.
Introduction
Use the join enrichment to combine the data extracts from multiple datastreams into a single extract. The data extracts must share a common column. You can combine datastreams of the same type or datastreams from different sources.
For a thorough, worked example of the join enrichment, see Example of join custom script.
Creating the enrichment
For more information on creating an enrichment, see Using custom scripts.
Configuring the enrichment
To configure the enrichment, fill in the following fields. Required fields are marked with an asterisk (*).
-
How*
-
Select how the datastreams are to be joined. Choose from one of the following options:
-
Left - Select this option to join two datastreams and keep all the data in the data extract of the datastream to which the join enrichment is applied (the left datastream). Any rows with non-matching key columns in the data extract of the joining datastream (the right datastream) are removed.
-
Right - This is the opposite of Left. Select this option to join two datastreams and keep all the data in the data extract of the joining datastream (the right datastream). Any rows with non-matching key columns are removed from the data extract of the datastream to which the join enrichment is applied (the left datastream).
-
Inner - Select this option to keep only the columns with matching rows from both data extracts. These are the columns defined in the Key field.
-
Outer - Select this option to keep all the rows from both of the data extracts to join.
-
Interval - Select this option to join only a selection of the data extracts of the joining datastreams. Enter the range for both the left and right datastreams. Any matching rows that are present in both of these ranges are joined using this enrichment. Enter the start and end of the range in the Left/Right Start and Left/Right Stop fields.
-
Leftinterval - Select this option to perform the same join as the Interval join described above. However, this option will keep all the non-matching rows in the data extract of the left datastream.
-
-
Keys*
-
Enter the name of the columns found in both data extracts from which to match the data. To add multiple keys to match, click
.
Select a column in one of the following ways:
-
Select String and enter the name of the column.
-
Select Integer and enter the position of the column in the data extract as an index value. Counting starts at 0. To select the first column, enter
0
. To select the second column, enter1
. To select the last column in the data extract, enter-1
.
If the columns in the data extracts to join have different names, enter the names of the matching columns in the Left Key and Right Key fields below.
-
-
Left Key*
-
If the columns in the data extracts to join have different names, enter the names of the columns to match in the left datastream (this is the datastream to which the enrichment is applied). To add multiple keys to match, click
. Use this field in conjunction with the Right Key field.
-
Right Key*
-
If the columns in the data extracts to join have different names, enter the names of the columns to match in the right datastream (this is the joining datastream). To add multiple keys to match, click
. Use this field in conjunction with the Left Key field.
-
Streams*
-
Enter the name of the datastream to join. To join multiple datastreams, click
and enter the names of the datastreams to join.
-
Columns
-
Enter the name of the columns in the right datastream (this is the joining datastream) to be joined. To join every column from the right datastream, leave this field blank and select the Join all field.
-
Distinct Join
-
Select this field to prevent multiple identical rows in the key columns from being joined. When selected, if multiple rows in the right datastream (this is the joining datastream) match a single row in the left datastream, then only the first row from the right datastream is joined.
-
Join all
-
Select this field to join all the completed data extracts from the selected datastreams.
-
Concatenate
-
If you are joining multiple datastreams, select this field to remove columns that have the same name.
-
Right Select
-
Enter a python expression to find and pre-select rows in the right datastream before the join starts.
-
Right Prefix
-
Enter a value to prefix on to the column names that are to be joined. For example, if you enter the value
joined-
then all the names of the columns joined into the left datastream (such asclicks
) would be renamed (such as tojoined-clicks
). -
Right Presorted
-
Select this field if the data extract of the right datastream (this is the joining datastream) has been sorted. This will speed up the enrichment.
-
Include Tags
-
Enter the names of the tags used to identify which data extracts are to be joined. If you populate this field, only the data extracts with the selected tags are merged. Leave this field blank to join only the most recent data extracts from the selected datastreams.
-
To add tags to a data extract, use the set_tags enrichment.
-
Exclude Tags
-
Enter the names of the tags used to identify which data extracts should not be joined. If you enter tags in this field, they will have priority over the tags entered in the Include Tags field.
-
Convert Keys
-
Select this field to convert keys to unicode.
-
Ignore Join Date
-
Leave this field unselected to join only the data extracts with a scheduled_date that is earlier than the current_extract_date. The Include Tags and Exclude Tags fields have priority over the Ignore Join Date field.
-
Left Start
-
Populate this field if you are using an Interval join. Enter the name of the column in the left datastream (the datastream to which the enrichment is applied) used to identify the start of the interval.
-
Left Stop
-
Populate this field if you are using an Interval join. Enter the name of the column in the left datastream (the datastream to which the enrichment is applied) used to identify the end of the interval.
-
Right Start
-
Populate this field if you are using an Interval join. Enter the name of the column in the right datastream (this is the joining datastream) used to identify the start of the interval.
-
Right Stop
-
Populate this field if you are using an Interval join. Enter the name of the column in the right datastream (this is the joining datastream) used to identify the end of the interval.
-
Subtable
-
Enter the name for a subtable that you want to contain the enriched data. The enrichment is applied to the whole data extract, then the enriched data is output into the subtable you have named here.
This subtable is a temporary table, which means it only exists for this custom script. You can apply additional instructions within the same custom script to the subtable. However, the subtable cannot be used in any other custom scripts.