Preparing a CSV File for Import

Preparing a CSV File for Import

by Community Manager a month ago - edited 2 weeks ago by (270 Views)

The use of omnichannel imports requires that your account be enabled for AudienceStream. The following topics will help you prepare your files for a successful import.

In this article:

Table of Contents Placeholder

What Data to Import

The best data to import will help create a fuller and richer visitor profile of your customers. To get the most out of an omnichannel import the data should have a visitor identifier associated with each row of data. This will ensure that the corresponding visitor within the Universal Data Hub is enriched.

When multiple files are uploaded at the same time using SFTP or S3, the files are processed in the order of the upload timestamp.

The following data sets are recommended for use with omnichannel imports:

  • Demographic Data
  • Prospect/Lead Data
  • Historical Purchase/Return Data
  • Offline Interactions (Support, Direct Marketing, Event Attendance, etc.)

File Format

To ensure a successful import of the data it is critical to understand the expected file format. Omnichannel supports CSV (comma separated values) files where the first line of the file must be a header line that names the columns of the file. Each line after that represents an event or a visitor record and must contain at least one visitor identifier attribute.

Column names may not contain "#", "^", or whitespace characters.

Files can be compressed into zip files to minimize upload time. The system automatically detects and handle zip files.

File Naming

Your CSV files must be named using the following format: {prefix}_{version}.csv
This format consists of the following two (2) parts:

Format Description
prefix A unique identifier for groups of files that share the same CSV column names.
version A unique identifier for a file within a prefix, usually a timestamp and (optionally) a version number

File Naming Examples

  • Example 1
    The following example is for in-store purchases dated March 19, 2017 and separated into two (2) files, "a" and "b".

    Prefix Version File Name
    storepurchases 20170319-a storepurchases_20170319-a.csv
    storepurchases 20170319-b storepurchases_20170319-b.csv
  • Example 2
    This example is a file named for in-store returns from March 14, 2017 and March 15, 2017.
    Prefix Version File Name
    storereturns 20170314 storereturns_20170314.csv
    storereturns 20170315 storereturns_20170315.csv

The prefix of one set of files should not match the prefix of another set of files. For example, attempts to maintain a prefix of store-transactions and store-transactions-returns will cause unexpected results because they share the same prefix: "store-transactions".

File Size and Import Limits

You can import up to 1,000,000 rows of data per day. The maximum number of rows per file is 100,000. Therefore, you can import the maximum amount of data by creating 10 files each with 100,000 rows for a total of 1,000,000 rows.

Smaller files, no bigger than 50MB, provide optimal import performance.

Splitting Large Files Into Smaller Files

If a CSV file is larger than 100,000 rows it must be split into smaller files, each containing the header line from the original. In this example, a file named master_purchases.csv has 325,000 rows of purchase data and is split into smaller files, each with 100,000 rows.

The split command creates smaller files with a lexically ordered suffix using the characters a-z.

$ split -l 100000 -a 1 master_purchases.csv purchases-
$ ls
master_purchases.csv  purchases-a  purchases-b  purchases-c  purchases-d

In this example, the resulting four (4) new files contain the following number of rows to total 325,000:

File Number File Name Number of Rows
1 purchases-a 100,000
2 purchases-b 100,000
3 purchases-c 100,000
4 purchases-d 25,000
Total Number of Rows 325,000

Then the head and cat commands are used to add the header line from the original file to the smaller files.

$ for file in purchases-*
> do
> head -n 1 master_purchases.csv > tmp
> cat $file >> tmp
> mv -f tmp $file
> done

Data Format

The data within your files should follow these guidelines to ensure a successful import:

  • Group Rows by Visitor ID
    Each row will have a visitor ID field and the file may contain multiple lines for the same visitor. Lines that contain the same visitor ID should be grouped together to optimize the import process.
  • File encoding
    We suggest UTF-8.
  • Field Ordering
    The ordering of the fields does not matter, apart from the visitor ID column, which we strongly suggest is leftmost.
  • Extra Columns
    Having extra fields that are not configured in AS will mean AS ignores those extra fields, but processes the configured fields.  Note that fields names must match the configuration exactly, including case.
  • Missing Columns
    Having missing fields in the file (fields that AS has been configured to process) will mean AS ignores the missing fields and processes the configured fields that are present. The only field which is mandatory is the visitor ID field.  You may use rules in AS when enriching other attributes to check for the presence or absence of omnichannel fields to effectively make other fields mandatory, if you wish.
  • Normalize "Empty" Values
    Some rows might not contain a value for each column. Omnichannel expects these columns to be empty and not to contain placeholder values such as "null", "empty", "blank", etc.
  • Normalize Date/Time Values
    Date/time values must be consistent throughout a file for a given field. Each date/time value must match the format specified in your omnichannel configuration for that field. For example, if the expected format is "yyyy-MM-dd", the values must contain 4-digit years and 2-digit month and day values eg. "2016-02-09".
    The following date components are supported: Year, Month, Day, Hour, Minutes, Seconds, Milliseconds, AM/PM, ISO , RFC, Era. (Learn more about Java SimpleDateFormat Patterns.)
  • Double-Quote Values with Commas
    If a value contains a comma (,) it must be surrounded by double-quotes to ensure the integrity of the CSV columns. Putting all values in double-quotes is highly recommended. Use a doubled double quote to indicate a double quote character in the data.
  • Omit Special Characters from Currency Amounts
    Columns that represent currency amounts, such as "OrderTotal" or "ProductPrice", cannot contain currency symbols or commas.
    Valid Values Invalid Values 
    39.75  $39.75
     0  zero
    1399.00  €1.399,00

Next Steps

Once your files are ready, click one of the following links to proceed with configuring a file import or an omnichannel import.