Collecting data from Amazon Athena

Follow this guide to collect data from Amazon Athena and preview the data.

Introduction

This guide explains how to collect data from Amazon Athena and preview the data.

This guide does not cover the whole functionality of the Amazon Athena connector. For more information on all features of the Amazon Athena connector, see Setting up an authorization to Amazon Athena.

Understanding datastreams and authorizations

When you connect to Amazon Athena through Adverity, you authorize Adverity to access your account. Adverity stores this authorization information in an authorization, and stores the data collection configuration in a datastream.

You can create several datastreams using the same authorization. The datastreams collect information from the same source, with the same login credentials, with different configurations.

Creating a datastream using Amazon Athena

Connecting to Amazon Athena

To connect to Amazon Athena, follow these steps:

  1. Select the workspace you work with in Adverity and then, in the platform navigation menu, click Datastreams.

  2. In the top right corner, click + Create datastream.

  3. Search for and click Amazon Athena.

  4. Choose one of the following options:

    • Click Setup a new authorization and set up the new authorization with your own login credentials. For more information on setting up a new authorization to Amazon Athena, see Setting up an authorization to Amazon Athena.

    • Click Send an access request to ask someone else to set up the new authorization. In the Email field, enter the email address of the person you want to ask to authorize the new authorization.

    • If you already have an authorization to Amazon Athena, choose an existing authorization.

  1. Click Next.

Choosing what data to collect

To choose what data to collect and customize the Amazon Athena datastream configuration, follow these steps:

  1. Click Custom configuration.

  1. Click Next.

  1. (Optional) Rename your datastream.

  1. In Query Type, select one of the following options:

    • Select Custom to collect data from Amazon Athena using an SQL query. Enter the SQL query in the Query field at the bottom of the page.

    • Select Saved to collect data using a SQL query created within Amazon Athena user interface. Select the query in the Named Query field at the bottom of the page.

  2. In AWS Region, select the region associated with the data to collect.

  3. In Work Group, select the work group on which to collect data. The list of available work groups is determined by the account used in the authorization to Amazon Athena.

  4. In Amazon S3 Bucket, enter the file path to the S3 bucket from which to collect data. For more information on finding the S3 bucket file path, see the Amazon documentation.

  5. (Optional) In Add Path to Folder, enter the file path to the directory which contains the data to collect.

  6. In Schema Name, enter the name of the table schema to use when collecting data from Amazon Athena. If you leave this field empty, the default table schema is used. For more information on table schema, see the Amazon documentation.

  7. Choose one of the following options:

    • In Query, enter an SQL query. This option is only available if you selected Custom in Query Type.

    • In Named Query, select a saved query created in the Amazon Athena user interface. This option is only available if you selected Saved in Query Type. For more information on creating a Query in Amazon Athena, see the Amazon documentation.

  1. Click Next.

Choosing where to transfer your data

To choose where to transfer your data, follow these steps:

  1. To assign destinations to your datastream, select their checkboxes. For more information on destinations and their configuration settings, see Introduction to transferring data.

  1. Click Next.

Collecting initial data

To collect the initial data, follow these steps:

  1. Choose the time period for which data is collected.

  1. Click Run fetch.

Previewing data collected from Amazon Athena

The fetch collects data from Amazon Athena which takes some time. The Overview page of the newly created datastream is now displayed. To preview the collected data, follow these steps:

  1. In the All tasks tab, find the task at the top of the list, and click Show extracts.

  2. Click the top hyperlink.

  3. The data extract is displayed in a table containing the data that you have fetched.

What’s next?

After collecting data from Amazon Athena, harmonize your data so that it conforms with Adverity's unified naming and formatting conventions.