Table of Contents Navbar
cURL python ruby go

API Introduction

If you're looking to interact with the Namara API, you've come to the right place. This guide will cover instructions and conventions for using our API, with language-specific examples.

API Fundamentals

The Namara API is built on an RPC Framework called Twirp. Twirp runs on Protobuf and offers JSON compatibility. It generates server files and client code in almost any language, making it quick and easy to develop on. This allows us to launch with a selection of client libraries, and will allow us to create additional language options over time.

Client Library and Language Offerings

We primarily support Python, Ruby and Go libraries, but we will continue to expand our offering. If you are using a programming language that we don't already support, please don't hesitate to reach out to us.

Pointing to a Namara API Domain

An API provides all functionality available to the app itself. This is available on the api subdomain of whichever Namara deployment you are using.

If you are used to accessing Namara at, then your API subdomain would be Otherwise, if you are using a custom domain such as, then the API is avilable at

Tips for accessing through cURL

Your API Key

Your individual API key is required to make requests to the Namara API and is available in the account section of the app. To access this, click on the user icon in the top corner of the application at any time, and navigate to Account Settings > Token. Please copy this token and store it in a secure location where it won't be shared with other users. We do not recommend committing this value to any publicly available repo.

Your API key must be used when initializing the client for our supported languages, or to be passed in the HTTP header X-API-Key: YOUR_KEY if you are commmunicating with the API directly or using the legacy API.

Dataset Updates

Dataset Versions

Unless you are using static datasets that do not update, you will want to become familiar with dataset versions. A dataset version is a point-in-time snapshot of the dataset's schema (or structure). If this schema changes from one update to the next, the dataset id will not change, but the version will increment.

For example, if you are using a dataset with three columns "A", "B", and "C", the version of that dataset will not change as long as those columns remain the same. But if one update adds column "D", the dataset will increment from version 1 to version 2. Below are some of the schema changes that would result in a new dataset version.

Dataset versions help you successfully keep track of the data, even if it changes structure. In order to integrate successfully with a dataset, you must include both the dataset ID and the dataset version in your queries. Both are found under the dataset's API tab.

Tagged Revisions

Many dataset updates will result in an increase or decrease in the dataset row count, or a change in its cell values. These updates, called revisions, are different from dataset versions because they highlight a change in the dataset values, but not its underlying structure.

For example, if you are using a dataset that updates every day and has a column called "Date_Retrieved", you would expect that column to increase one row for each day the dataset updates (2020-01-01, 2020-01-02, etc). This is a revision of the dataset and not a new version, because the schema of the dataset has not changed.

If, however, you are using a dataset that adds a new column to the schema for every date on which the data is retrieved, every update will increment the dataset version.

You may create dataset revision "tags" based on point-in-time views of the dataset. Tagged revisions let you query stable or desired dataframes, even if the data has since been revised. You may create and copy these tagged revisions in the "Versions" tab of a dataset view. By default, datasets will include a tagged revision "latest" on the current view of the dataset. This tag, as its name suggests, will update alongside the dataset, so if the data increments in either version or revision, the "latest" tag will update accordingly.

Revision tags are useful because they let you pinpoint a precise revision of the dataset and return to that dataframe as needed. If you are building an application that relies on a specific dataset structure, you will want to ensure that the application runs successfully, regardless of how the underlying data may have changed across updates. By creating a revision tag on a stable snapshot of the data and using this revision ID in your application, you will ensure that the application does not break, even if the dataset schema changes.

Adding Packages

pip install --extra-index-url namara-python==0.9.5
# Gemfile

# Please use this private gem source
gem 'namara', source: ''
// Go modules don't provide great support for private packages, please download package and place
// in application pkg/ directory

To the right you will see the various entries that will need to be added in the language specific package managers.

Making a Request

curl -XPOST 
    -H 'X-API-Key: <key>' \
    -H 'Content-type: application/json' \
    -d '{"statement": "SELECT ... FROM ..."}' \
from namara_python import Client

client = Client(

resp = client.query(statement='SELECT col1, col2 FROM ...')
for row in resp:
require "namara"

namara = "")

resp = namara.query("SELECT col1, col2 FROM ...")
puts resp.metadata.col1.type # => e.g. text
res.rows.each do |row|
package main

import (


func main() {
    client := namara.NewClient("", "YOUR_API_KEY")

    resp, err := client.Query(context.Background(), "SELECT ... FROM ...")
    if err != nil {


The examples in the code column show how to make a request using our Data API. Please change the examples to the right to include your API key as well as the specific query you are trying to make. For more information about querying the Data API, please see the Data API section.

Handling Pagination

Every request that returns a list of results will have pagination parameters that should be provided. 1000 results is both the default and maximum per request. For limits exceeding 1000, either an error will be returned or a forced 1000 result limit will be applied. This may vary depending on the service. In any case, it is recommend that all listing requests have a pagination strategy.

Query API (NiQL)

To make querying our data catalog easier, we have created a query language Query Language, or NiQL for short. If you have used SQL in your past, you should find this language very familiar as it was based off of standard SQL. The largest difference between NiQL and standard SQL is that the only supported queries start with the keyword SELECT. All keywords that mutate the underlying data such as INSERT, UPDATE, DELETE, etc. are not supported.

Handling Pagination

Every request that returns a list of results will have pagination parameters that should be provided. 1000 results is both the default and maximum per request. For limits exceeding 1000, either an error will be returned or a forced 1000 result limit will be applied. This may vary depending on the service. In any case, it is recommend that all listing requests have a pagination strategy.

Creating a Query

A query follows the basic form one would use in SQL.

SELECT {columns} FROM {dataset_id} WHERE {condition} LIMIT {limit}

When using a dataset ID, it is possible to pass in either the dataset UUID on it's own, or optionally include a dataset version or a revision tag along with it. For more information on dataset versions, please see the Dataset Versions documentation, and for more information on dataset revision tags, please see the Tagged Revisions documentation. To ensure that your integration is compatible with live data, we recommend that you lock your dataset queries to a version.

Using a dataset id or including a version/tag would look like so:

SELECT ... FROM "de1049a3-e356-4251-b8f9-7a628b8b3b97"
SELECT ... FROM "de1049a3-e356-4251-b8f9-7a628b8b3b97/0"
SELECT ... FROM "de1049a3-e356-4251-b8f9-7a628b8b3b97@tag"

SELECT Features


  FROM "data-set-id"

  SELECT COUNT(column1)
  FROM "data-set-id"


  SELECT DISTINCT column1, column2
  FROM "data-set-id"


  FROM "data-set-id"


  SELECT MIN(column1) AS minColumn1, MAX(column2) AS maxColumn2
  FROM "data-set-id"


  SELECT AVG(column1) AS avgColumn1, SUM(column1) AS sumColumn1
  FROM "data-set-id"

FROM Features


  FROM "data-set-id" AS DataSet1 INNER JOIN "data-set-id2" AS DataSet2
  ON DataSet1.foreign_id = DataSet2.external_id



  FROM "data-set-id"
  SELECT objectid
  FROM "data-set-id2"

WHERE Features


  SELECT id, address, city, province, country
  FROM "data-set-id"
  WHERE (country = 'Canada' AND province = 'Manitoba' AND NOT city = 'Winnipeg') OR country ='Mexico'


  FROM "data-set-id"
  WHERE country LIKE 'C_%'


  FROM "data-set-id"
  ORDER BY country, province, ... [ASC|DESC]


  FROM "data-set-id"
  WHERE country IN ('Mexico', 'Canada', ...)


  FROM "data-set-id"
  WHERE liquidation_date BETWEEN 2016-01-01 AND 2018-01-01


  SELECT COUNT(customer_id), country
  FROM "data-set-id"
  GROUP BY country
  HAVING COUNT(customer_id) > 100 


  FROM "data-set-id"
  WHERE total_count = [ANY|ALL] (SELECT COUNT(customer_id) FROM "data-set-id2")
  FROM (SELECT customer_id, parent_account_id, purchase_total FROM "data-set-id2")
  AS subSelect
  WHERE purchase_total > 1500

Geospatial Features

Geometry properties for datasets are stored as GeoJSON, and will be rendered as that unless instructed otherwise. You can do this using the transformation functions ST_GeomFromText to create geometry objects, which can then be manipulated and transformed. Use ST_AsGeoJSON or ST_AsText in order to turn the final result back to text from binary.

Here's an example in which geometry_property is a property from the dataset of type geometry (this information can be obtained in the API Info tab when viewing a dataset):

  SELECT ST_AsGeoJSON(ST_GeomFromText(geometry_property))
  FROM "data-set-id"

Supported Functions


We are very interested in expanding the geospatial capabilities of NiQL. If there is additional functionality you need, or there are any issues with the the implementations, please do not hesitate to reach out to us.

Pagination with NiQL

Like the Data API, a maximum number of rows will be returned on each query. If the query string does not contain LIMIT X OFFSET Y, the parser will append the maximum number of allowable rows in order to enforce the limit.

For results larger than the allowed amount, manual pagination in subsequent requests will have to be used.

The default limit is 1000 rows, but this may vary depending on which deployment of Namara you are interacting with. Refer to the Meta endpoint for instructions on how to obtain this information.

Catalog API

We have gone through the process of getting a client setup and querying the data API, but it doesn't stop there. All functions of the Namara Catalog are available through the API and a complete list of the supported functions are listed here.

Handling Pagination

Every request that returns a list of results will have pagination parameters that should be provided. 1000 results is both the default and maximum per request. For limits exceeding 1000, either an error will be returned or a forced 1000 result limit will be applied. This may vary depending on the service. In any case, it is recommend that all listing requests have a pagination strategy.


curl -XPOST
    -H 'X-API-Key: <key>' \
    -H 'Content-Type: application/json' \
    -d '{"filter": {"limit": {"value": 1}}}' \
organization = {title: 'Some Org'}

organizations = namara.list_organizations
organization = namara.create_organization(organization)
organization = namara.get_organization('org-1')
organization = namara.update_organization(organization)
_ = namara.delete_organization(organization)
org = {"title": "Some Org"}

orgs = client.list_organizations()
org = client.create_organization(organization=org)
org = client.get_organization(id="org-1")
org = client.update_organization(organization=org)
org = client.delete_organization(organization=org)
org := &namara.Organization{Title: "Some Org"}

orgs, _ := client.ListOrganizations(ctx, namara.OrganizationFilter{})
org, _ = client.CreateOrganization(ctx, org)
org, _ = client.GetOrganization(ctx, "org-1")
org, _ = client.UpdateOrganization(ctx, org)
_ = client.DeleteOrganization(ctx, org)

The organization is the highest unit. A user can be a member of many organizations and an organization has many groups.

For list/get/create/update/delete operations on Organizations.

Organization Members

curl -XPOST
    -H 'X-API-Key: <key>' \
    -H 'Content-Type: application/json' \
    -d '{"filter": {"limit": {"value": 1}}}' \
member = {organization_id: 'org-1', user_id: 'u-1', permission: 1}

members = namara.list_organization_members('org-1')
member = namara.add_organization_member(member)
member = namara.get_organization_member('org-1', 'u-1')
member = namara.update_organization_member(member)
member = namara.remove_organization_member(member)
member = {"organization_id": "org-1", "user_id": "user-1", "permission": 1}

members = client.list_organization_members(organization_id="org-1")
member = client.add_organization_member(member=member)
member = client.get_organization_member(organization_id="org-1", user_id="user-1")
member = client.update_organization_member(member=member)
member = client.remove_organization_member(member=member)
mem := &namara.OrganizationMember{OrganizationId: "org-1", UserId: "u-1", Permission: 1}

mems, _ := client.ListOrganizationMembers(ctx, "org-1", &namara.OrganizationMembersFilter{})
mem, _ = client.AddOrganizationMember(ctx, mem)
mem, _ = client.GetOrganizationMember(ctx, "org-1", "u-1")
mem, _ = client.UpdateOrganizationMember(ctx, mem)
_ = client.RemoveOrganizationMember(ctx, mem)

For list/get/add/update/remove operations on Organization Members.

Note: A user must be a member of the organization to perform read operations on its users and be able to manage members to perform add/remove/update operations on its members.


curl -XPOST
    -H 'X-API-Key: <key>' \
    -H 'Content-Type: application/json' \
    -d '{"filter": {"limit": {"value": 1}}}' \
group = {organization_id: 'org-1', title: 'Some Group'}

groups = namara.list_groups('org-1')
group  = namara.create_group(group)
group = namara.get_group('grp-1', 'org-1')
group = namara.update_group(group)
response = namara.delete_group(group)
group = {"organization_id": "org-1", "title": "Some Group"}
filter = {"limit": 10, "offset": 20}

groups = client.list_groups(organization_ids=["org-1"], filter=filter)
group = client.create_group(group=group)
group = client.get_group(id="group-1", organization_id="org-1")
group = client.update_group(group=group)
_ = client.delete_group(group=group)

grp := &namara.Group{OrganizationId: "org-1", Title: "Some Group"}

grps, _ := client.ListGroups(ctx, []string{"org-1"}, &namara.GroupsFilter{})
grp, _ = client.CreateGroup(ctx, grp)
grp, _ = client.GetGroup(ctx, "org-1", "grp-1")
grp, _ = client.UpdateGroup(ctx, grp)
_ = client.DeleteGroup(ctx, grp)

For list/get/create/update/delete operations on Groups.

Group Members

curl -XPOST
    -H 'X-API-Key: <key>' \
    -H 'Content-Type: application/json' \
    -d '{"filter": {"limit": {"value": 1}}}' \
member = {group_id: 'grp-1', user_id: 'u-1', permission: 1}

members = namara.list_group_members('grp-1')
member = namara.add_group_member(member)
member = namara.get_group_member('grp-1', 'u-1')
member = namara.update_group_member(member)
response = namara.remove_group_member(member)

member = {"organization_id": "org-1", "user_id": "user-1", "permission": 1}

members = client.list_group_members(group_id="grp-1")
member = client.add_group_member(member=member)
member = client.get_group_member(group_id="grp-1", user_id="u-1")
member = client.update_group_member(member=member)
_ = client.remove_group_member(member=member)

grpMem := &namara.GroupMember{GroupId: "grp-1", UserId: "u-1", Permission: 1}

grpMems, _ := client.ListGroupMembers(ctx, &namara.GroupMembersFilter{}, "grp-1")
grpMem, _ = client.AddGroupMember(ctx, grpMem)
grpMem, _ = client.GetGroupMember(ctx, "u-1", "grp-1")
grpMem, _ = client.UpdateGroupMember(ctx, grpMem)
_ = client.RemoveGroupMember(ctx, grpMem)

For list/get/add/update/remove operations on Group Members.

Note: A user must be a member of the group to perform read operations on its users and be an admin of the group to perform add/remove/update operations.


curl -XPOST
    -H 'X-API-Key: <key>' \
    -H 'Content-Type: application/json' \
    -d '{"filter": {"limit": {"value": 1000}}}' \
filter = {query: 'my search', limit: 10, offset: 0}

datasets = namara.list_all_datasets(filter)
datasets = namara.list_organization_datasets('org-1', filter)
dataset = namara.get_dataset('ds-1')
filter = {"query": "my search", "limit": 10, "offset": 0}

datasets = client.list_datasets(filter=filter)
datasets = client.list_organization_datasets(organization_id="org-1", filter=filter) 
dataset = client.get_dataset(id='ds-1')
datasets, _ := client.ListDatasets(ctx, namara.DatasetFilter{Limit: 10, Offset: 0, Query: "my search"})
datasets, _ = client.ListOrganizationDatasets(ctx, "org-1", 
    &namara.OrganizationDatasetFilter{Limit: 10, Offset: 0, Query: "my search"})
dataset, _ := client.GetDataset(ctx, "ds-1")

For search/get/update operations on Datasets.


curl -XPOST 
    -H 'X-API-Key: <key>' \
    -H 'Content-type: application/json' \
    -d '{"statement": "SELECT ... FROM ..."}' \
results = namara.query('SELECT ... FROM ...')
results = namara.query(statement='SELECT col1 FROM ...')
for row in results:

results, _ := client.Query(ctx, "SELECT col1 FROM ...")
for results.Next() {
    row, _ := results.Result()

For query operations on datasets.


This does not include all API methods available and more will be added to the documentation soon.


Pandas (Alpha)

import namara_pandas as npd
from namara_python.client import Client

namara = Client(api_key='YOUR_API_KEY')

df = npd.DataFrame(data_set_id='DATASET_ID', client=namara.query_client())

# pandas internal `count` will say 50, because that's how many rows we load by default
# custom field for the actual count of the whole dataset
# or

# this will iterate over the whole dataset as tuples
for row_tuple in df.itertuples:
    print(row_tuple) # these are unnamed tuples
    print(row_tuple[0]) # to get first column of row

group1 = df.groupby(by=['col1'])
group2 = df.groupby(by=['col1', 'col2'])

# will iterate over the whole dataset
for name, group in group1:

# to see the difference between 1 vs 2 column group bys
for name, group in group2:

df2 = df.apply(npd.Series.eq, args=[True])
df3 = df.fillna(0)

Namara now has a Pandas package for working with the catalog. The functionality is still very limited but it includes some of the most common functions and may provide a start for working with the data. The benefit of connecting to the platform directly with Pandas is that the data stays up-to-date over time.

Integration with Pandas is only supported in Python

Note: The Pandas API is in Alpha and still very limited in functionality. If you need additional functionality than provided, please Download and copy of the data from the platform and use the read_csv method of Pandas.

Legacy API

If you have previously been plugging into Namara data through our REST API then have no fear. We have created a backwards-compatibility layer that ports over the support of our v0 API.

Pointing to a Namara Domain

All Namara domains will make this service available off of the api subdomain. The only addition is adding /v0/ to the url in order to point to this service.

Supported Endpoints

If you need other legacy endpoints supported, please contact us and we will see what we can do.

GET Data Query

This endpoint is used for creating selection and aggregation views on a single dataset.

    -H 'X-API-Key: <key>' \
    -H 'Content-type: application/json' \<id>/data
Path Parameters Type Description
data_set_id (required) string UUID for accessing the dataset
Query Parameters Type Description
result_format string Query response format: csv, json, or geojson (default is json)
geometry_format string Either wkt or geojson for all geometry values (default is geojson)
geojson_feature_key string Property name to use as geometry when rendering geojson
limit integer Number of rows to return - the default value is also the maximum: 1000 (see Pagination)
offset integer Results will be returned starting at the row number specified (see Pagination)
select string Comma-separated list of column names to return
order string Specify the order of the returned results (see Ordering)
where string Conditions for performing query (see Conditions)

Formats, Pagination, & Ordering

The Namara Data API produces results in different formats, json, csv, or geojson, depending on the value you pass into the result_format parameter in your query. In examples of results, you'll see three buttons above the code block which will show example results in your preferred format. Here's how they look:

          "coordinates": [ -79.4, 43.7 ]


Each query response is limited to 1000 results. To view the entire response, either use the export endpoint to render the results of the query, or use limit and offset arguments to paginate over results, until no more values are found.


...&order=p0 ASC

Pass in either ASC or DESC after specifying a column to see results in ascending or descending order, respectively.


The where argument supports a number of comparison operators and geospatial functions:

Symbol Alias Description Use
= eq Returns an exact, case-sensitive match of the value p0=100 p3 eq '2015-01-06'
!= neq Returns an exact, case-sensitive match of the value p0!=50 p3 neq '2016-07-16'
> gt Works for numerical, date and datetime values p0>100 p0 gt '2010-01-01'
>= gte Works for numerical, date and datetime values p0>=75 p0 gte '2010-01-01'
< lt Works for numerical, date and datetime values p0<200 p0 lt '2018-04-01'
<= lte Works for numerical, date and datetime values p0<=150 p0 lte '2017-11-01'
IS Only works for boolean values p1 IS true p1 IS NULL
IS NOT Only works for boolean values p1 IS NOT true p1 IS NOT NULL
LIKE % = wildcard, case-insensitive p2 LIKE '%foo%'
NOT LIKE % = wildcard case-insensitive p2 NOT LIKE '%foo%'
IN Works for values in a specified list of items p0 IN (100, 'foo', true)

Operator Examples

or{YOUR_API_KEY}&where=co2_emissions_g_km lt 200


2b){YOUR_API_KEY}&where=make="CHEVROLET" OR make="CADILLAC"

3){YOUR_API_KEY}&where=(make="CHEVROLET" OR make="CADILLAC") AND (fuel_consumption_city_l_100km<=12 AND fuel_consumption_hwy_l_100km<=9)

  1. List all vehicles with CO2 emissions less than 200g/km

  2. a) Get fuel consumption ratings for all Cadillac and Chevrolet vehicles

    b) The same operation with boolean operators

  3. List all Cadillac and Chevrolet vehicles with good city and highway mileage

Example 3 is a more complex query with multiple conditions while explicitly specifying the evaluation order.

Geospatial Operators

Datasets will commonly contain latitude and longitude as properties.

The where condition query parameter supports some geospatial functions for querying datasets.

Geospatial Operator Examples

1) ...&where=nearby(p3, 43.653226, -79.3831843, 10km)

2) ...&where=bbox(p3, 43.5810245, -79.639219, 43.8554579, -79.11689699)
  1. Returns all rows in which the value in the specified column is within radius distance of the point specified by latitude and longitude.

  2. Returns all rows in which the value in the specified column lies within the bounding box created by the two coordinates.

GET Export

Exporting is almost identical to the Data Query endpoint, with the difference being that the complete result of the query will be saved to a file, and that file will be served up.


Export path and query parameters look a lot like the parameters for accessing the dataset. Let's look at the requests you can make:

Path Parameters Type Description
data_set_id (required) string UUID for accessing the dataset
Query Parameters Type Description
result_format string Query response format: csv, json, or geojson (default is json)
geometry_format string Either wkt or geojson for all geometry values (default is geojson)
geojson_feature_key string Property name to use as geometry when rendering geojson
compress_with string Compression options for final export
limit integer Number of rows to export
offset integer Results will be returned starting at the row number specified (see Pagination)
select string Comma-separated list of column names to return
order string Specify the order of the returned results (see Ordering)
where string Conditions for performing query (see Conditions)
redirect boolean when selected, the response will redirect the user to the resource URL rather than return the url itself


    "message": "Exported",
    "url": "<url to file>",
    "compressed_with": "none"

  { "message": "Pending" }

    "message": "Failed",
    "error_message": "<reason for error>"
Response Description
200: OK When the export has finished and redirect is not set (example 1)
202: Accepted When the export has begun (example 2)
422: Unprocessable Entity Failed to export (example 3)

GET Aggregate

Use aggregation functions to retrieve dataset-level information.


Parameter Type Description
data_set_id (required) string UUID for accessing the dataset
operation (required) string Operation function to perform (see Operations)
where string Conditions for performing query (see Conditions)


Function Description Use
count The number of rows of data. Using * will count all rows - specifying a property will only count non-null rows for that property. count(*) count(p0)
sum The sum of all values in a column sum(p0)
avg The average of all values in a column avg(p0)
min The minimum value in a column min(p0)
max The maximum value in a column max(p0)

Operator Examples




  1. Reveals the average CO2 emissions of Cadillac vehicles

  2. Reveals the Cadillac vehicle with the least CO2 emission

  3. Reveals all the Cadillac vehicles


Ingester configs describe all the necessary assumptions which were made when processing a supplied list of files. These assumptions cover everything from header scraping to property value semantics. Sometimes these assumptions aren't correct, and must be corrected. That is why many values of the config are overwritable. This document outlines the config structure and parameters.


Ingester configs are written using yaml. There are four primary sections which exist at the top level. None of these sections are required, and are populated when not present.

convert:{} files:{} props:{} meta:{}


The convert top level option (aka global convert config) groups default format and data normalizing options for all files when unspecified on the files level.


The files top level option pairs urls with access parameters and local convert configurations. Single files can have many paths when they are archives or folders.


The props top level option pairs column keys with kinds and relevant kind-aware normalization options.


The meta top level option defines options related to the pipeline. For example a callback url for status updates and config returns are specified here.


The most basic config looks like this:

files:   - url: path_to_file

There are no other top level options required for this ingestion therefore their headings can be omitted.

Configs can have multiple files:

files:   - url: http://file-server/jan.csv   - url: http://file-server/feb.csv   - url: http://file-server/mar.csv

The top level convert can set default options for these files

convert:    header_rows: 1    skip_before_header: 1 files:   - url: http://file-server/jan.csv   - url: http://file-server/feb.csv   - url: http://file-server/mar.csv

A local convert can override default options

convert:    header_rows: 1    skip_before_header: 1 files:   - url: http://file-server/jan.csv   - url: http://file-server/feb.csv   convert_configs:      - path: http://file-server/feb.csv      header_rows: 1      skip_before_header: 1      skip_after_header: 1   - url: http://file-server/mar.csv

File Config

There can be one or many file configs in a single Ingester config. They are listed in the files top level option, and are indexed by urls. For a successful ingest, all specified files which aren't ultimately ignored must share the same header names. Multiple files are concatenated in the order that they appear in the config. Below is the list of properties in a file config:

Property YAML Description
url url: http://file-server/mydata1.csv" Specifies the url of the file
convert_configs convert_configs:   - path: path_to_file ... Specifies all the convert configs. See Convert Config
ftp_user ftp_user: username Specifies the username of the ftp user
ftp_pass ftp_pass: 1234 Specifies the password for the ftp user
iterators iterators:   - param: page     increment_by: 1     begin: 1     end: 20 Allows for paginating over api. This is an advanced feature called Iterators, and is covered in the iterators section

Convert Config

Options specified in either the top level convert or any local converts modify how the selected files are parsed, and what they output. These changes apply before Ingester guesses any property kinds, and when cells are normalized.

These options among others allow users the ability to skip tedious data formatting tasks each time a dataset is ingested. Here is the exhaustive list of options including examples. Some of these options don't apply to certain formats, and are ignored when used inappropriately. Ingester does a great job guessing these values when they are needed and aren't present.

Property YAML Description
path path: my_dataset.csv Selects a file inside an archive, or selects the single file included by the url. Paths can include wildcards for grouping and predicting files as they change over time.

See the Path Querying section for additional details.
ignore ignore: true When set to true, will ignore the file
format format: xlsx Sets the file format of the file
encoding encoding: ISO8859_1 Specifies the input encoding to be converted. See the Encodings section for a list of possible encodings.
col_sep col_sep: "," Specifies the column separator of the file
row_sep row_sep: "\n" Specifies the row separator of the file
quote_char quote_char: "\"" Specifies the character that escapes file format rules used in the file. This only relevant for csv, psv, and tsv file formats.
skip_before_header skip_before_header: 10 Specifies the amount of rows you are skipping from the start of the dataset, up until (but not including) the start of `header_rows`.
header_rows header_rows: 10 Specifies the amount of header rows in the data set, contiguous with `skip_before_header`. Also note that if you specify multi-line `header_rows`, the headers will be concatenated column-wise.
skip_after_header skip_after_header: 10 Specifies the amount of rows you are skipping from the end of `header_rows`. Note that the contiguous options are `skip_before_header`, then `header_rows`, then `skip_after_header`.
offset offset: 3 Ignores the first n columns of the dataset
columns columns: 3 Reads up to n columns starting from relative to offset
sheet sheet: 1
sheet: Sheet3
Selects the sheet to read from. Supports both the index and the sheet name. Only applicable for xlsx and xls files.
path_header path_header: ["people[:].first_name"] Specifies the header of a nested format type. Only applicable for JSON and XML Please see JSON or XML in the Examples section for details. .
widths widths: [0, 1, 2, 3] Only applicable for fwf files. Specifies the width of each cell of the file. Please see the Fixed Width example in the Examples section for details.


Ingester outputs a typed binary format called parquet. Parquet allows for high performance querying through Namara. Before data is sent to parquet, it must be classified with a kind, and normalized into that kind. The props top level option allows the user to define which columns have which kinds. Ingester will guess any columns whose keys do not exist. See the properties example below.

Property YAML Description
key key: name The column name
kind kind: INTEGER The kind of column.

format format: "%y-%m-%d" Deprecated see Formats. Specifies the the date format.

Only used for DATE, and DATETIME kinds.
formats formats: ["%y-%m-%d"] Specifies a list of possible date formats.

Only used for DATE, and DATETIME kinds.
ThousandsSeparator thousands_separator: "," Specifies the thousand seperatar.

Radix radix: "." Specifies the radix.

Only used for DECIMAL, PERCENT, CURRENCY kinds.
Filldown filldown: true If set to true, Blank fields in a column, will be filled with last meaningful value
Rename rename: new_column_name Renames the column.


This sample dataset contains three columns with different kinds:

Building, Height ft, Date Opened
CN Tower, "1,815.3", "June 26, 1976"

The CSV file type does not specify what kind of data is present in which columns. Ingester is responsible for guessing these types. Here is the input config:

files:   - url: http://server/buildings.csv

And here is what the props option would look like after ingest:

props:   - key: building     kind: TEXT   - key: height_ft     kind: DECIMAL     thousands_separater: ","     radix: "."   - key: date_opened     kind: DATE     formats: ["%B %-d, %Y"]

Meta Config

The meta top level option defines the following options related to the pipeline:

Property YAML Description
CallbackUrl callback_url: http_callback_url Specifies the callback url for status updates and config returns
row_column row_column: column_name Ingester supports adding a row counting column to the result parquet.
strict strict: true When this flag is set to true, Ingester will fail the ingestion process whenever it loses data, or makes destructive assumptions.

When this flag is set to false (default), Ingester will nullify a cell value when it can't parse this cell, and ingestion will be successful (if no other errors are present).

job_id job_id: jobID A job id is unique id, that corresponds to the ingests job. Every a ingest has a corresponding unique identifier attached to it know as a Job ID.

By default, Ingester will create a Job ID for you, however you can override this will an identifier of your choice.
FilePriorityOff file_priority_off: true As you may know it is possible ingest multiple files with Ingester. If the files are not of the same type (all csvs), then by default Ingester will choice the file with greatest priority.

Priority list:
  1. geojson
  2. shp
  3. separated-value (csv, tsv, psv, txt)
  4. xlsx
  5. xls
  6. gml
  7. kml
  8. fixed-width
  9. json
  10. xml
You can disable this feature by enabling File Priority Off
rewrite rewrite:   rule: [a-zA-Z]+   replace: Column_{index} Ingester allows you to rewrite column names. This can be done by specifying the following
  - rule: a regular expression defining your matching criteria
  - replace: a string of replacement placeholders. You can add any text you want here. It also supports the following two special placeholders.
   - {index}: matches the index of column
   - {#}: matches a capture group matched from your rule

Path Querying

All convert configurations require a path. In cases of many files (archives with hundreds of files and folders), path queries help the user group wanted and unwanted options.

Consider this simplified config:

files:   - url: / has these sub files and folders:


Wildcard (*)

Wildcards allow for arbitrary matching for file names, file extensions, and directory names.

Example: Match all root files with a csv extension

files:   - url: /     - path: /*.csv

Example: Match all files in the 2000s with a csv extension

files:   - url: /     - path: /20*/*.csv

Recursive Wildcard (**)

Recursive wildcards allow for completely arbitrary matching

Example: Match all csv files in all folders

files:   - url: /     - path: /**.csv

Example: Match all csv files in 2019

files:   - url: /     - path: /2019/**.csv

Example: Match all csv files in fall and winter 2019

files:   - url: /     - path: /2019/**/*.csv

Example: Match all csv files in winter

files:   - url: /     - path: /**/winter/*.csv

Example: Match all csv files that are in seasons

files:   - url: /     - path: /**/**/*.csv

Workflow Tips

Ingester is fast. If there is a mistake in your config, Ingester will tell you right away. Ingester's initial setup phase can take around 5 seconds to complete. This time scales based on the number of files/archives. If Ingester begins ingesting, the config which was passed to it almost always has no issues. Ingester's quick response is a great tool for verifying tweaks.

Blind ingests are useful for exploring unknown archive structures. Ingester will generate a list of ignored, and defaulted files. Wildcards can then be inserted for grouping ignored and defaulted files for easier tweaking.

Always remember to check the result data. Valid configs do not always result in the correct data.


Iterators are a mechanism for grouping many files from a single source into one file config. Whether this is a list of files on a file server, or calls to a get based api, grouping is as simple as templating the file url and specifying a list of iterators like so:

files:   - url:{{.page}}   convert_configs:      - path: books.json        format: json        path_header:          - books[:].title          - books[:].price   iterators:      - param: page        begin: 1        end: 20

Specifying multiple iterators nests them. So if I wanted to search for every book from 2010 to 2020, it would be like so:

files:   - url:{{.page}}&year={{.year}}   convert_configs:      - path: books.json        format: json        path_header:          - books[:].title          - books[:].price   iterators:      - param: year        begin: 2010        end: 2020      - param: page        begin: 1        end: 20

Iterators can be incremented by any value, specified using increment_by.

files:   - url:{{.year}}.csv   convert_configs:      - path: *.csv        format: csv   iterators:      - param: year        begin: 1988        end: 2018        increment_by: 4

Iterators have an optional end field.

If end is not specified, it will gracefully terminate when an error (eg. 404) or empty row is returned (as long as it is not the first iteration, which would raise an error).

files:   - url:{{.year}}.csv   convert_configs:      - path: *.csv        format: csv   iterators:      - param: year        begin: 1988        increment_by: 4


CodePage037, CodePage1047, CodePage1140, CodePage437, CodePage850, CodePage852, CodePage855, CodePage858, CodePage860, CodePage862, CodePage863, CodePage865, CodePage866

ISO8859_1, ISO8859_2, ISO8859_3, ISO8859_4, ISO8859_5, ISO8859_6, ISO8859_7, ISO8859_8, ISO8859_9, ISO8859_10, ISO8859_13, ISO8859_14, ISO8859_15, ISO8859_16


Macintosh, MacintoshCyrillic

Windows1250, Windows1251, Windows1252, Windows1253, Windows1254, Windows1255, Windows1256, Windows1257 Windows1258, Windows874


Ingesting a CSV while offsetting

Often datasets come with additional human readability formatting like multi line headers, or column descriptions. Two of the most commonly used options are skip_rows and header_row.

My Dataset Title,,,,,,

Ingester helps you remove formatting by specifying where desired data exists.

files:   - url: https:/server/my_dataset.csv   convert_configs:      - path: my_dataset.csv        skip_before_header: 1        header_rows: 1        skip_after_header: 1        offset: 1        column: 5

By applying this config to My Dataset, Ingester outputs a dataset looking like this:


Ingesting multiple files

You can ingest files from multiple urls as long as the headers of each dataset match.

files:   - url: https://my_server/my_dataset.csv   - url: https://my_server/my_dataset2.csv

Ingesting a JSON file

Ingester supports streaming json from APIs or simply a hosted json file. In order for Ingester to select columns, it requires a list of json paths. For those unfamiliar see JsonPath.

Ingester json paths don't require the initial $.. Json paths are specified in the appropriate convert config like so:

files:   - url: https://file-server/my_dataset.json   convert_configs:      - path: my_dataset.json      format: json        path_header:          - people[:].fname          - people[:].lname

Ingesting an XML file

Ingester supports streaming xml. Just like json, Ingester requires that columns are selected in x-path format. For those unfamiliar see X-Path.

X-Paths are specified in the convert config like so:

files:   - url: https://file-server/my_dataset.xml   convert_configs:      - path: my_dataset.xml      format: xml        path_header:          - bookstore/book/title/text()          - bookstore/book/price/text()

Ingesting a Fixed Width file

Ingest supports streaming fixed width (.fwf) formatted files. Ingester requires that column widths are specified under the widths convert config option as a list of integers. Space is the only supported padding character right now.

files:   - url: https://file-server/my_dataset.xml   convert_configs:      - path: my_dataset.fwf        format: fixed-width        widths:          - 10          - 20

Ingesting a ZIP file

Ingester supports ingesting ZIP files. A zip can contain multiple files, and so you can ingest multiple files by specifying another convert config.

files:   - url: http://file-server/   convert_configs:      - path: /one.csv      - path: /two.csv

Ingesting file over SFTP/FTP

Ingester supports ingesting over SFTP and FTP protocols as well.

files:   - url: sftp://file-server/mydata.csv      ftp_user: username      ftp_pass: password

Namara ER

Namara also offers a powerful entity resolution service that makes the task of data cleaning and data enrichment easier for data professionals. Entity resolution is the task of identifying different representations of the same real-world entities across datasets.

Getting Started with Namara ER

pip3 install --extra-index-url namara-er==1.0.7

Getting started with Namara ER is easy. In your terminal, simply install the namara-er package. This will install the namara-er package on your device.

Authentication/Creating ER Client

from namara_er.client import Client

client = Client(

The api_key for Namara ER is the same as your NAMARA_API_KEY.

Once you've installed the namara-er package, to use Namara ER, open up your favourite text editor, import the Namara ER library, and create your ER client.


There are two main methods in Namara ER: get_entities() and append_codes(). We will go through each of these individually.

res = client.get_entities(
    entities=["thinkdata", "thinkdata inc"]

The get_entities() method performs entity resolution on the list of strings provided. It takes two arguments: entity_type and entities.

Parameter Type Description
entity_type string Type of entity; Currently supported entities: ["business", "people", "address"]
entities list A list of strings to be resolved through Namara ER.

Understanding the Example Response
Below is an examples response. The following table describes each element of the response.

{'dataset_id': 'ER_01960f16-f0a1-4ab6-b663-e35b464f1c88',
 'entities': [{'code': '3e39bd2406', 'input_value': 'thinkdata'},
  {'code': '3e39bd2406', 'input_value': 'thinkdata inc'}],
 'entity_type': 'business'}
Namara ER JSON Response Description
dataset_id Namara dataset id; if not given, assign auto-generated id
entities A list of dictionaries containing code and input_value for each input string
code A unique identifier used for clustering like entities
input_value The original string provided at input
entity_type Entity type specified at input
df = pd.DataFrame({"company": ["Thinkdata", "thinkdata inc"]})

res_df = client.append_codes(
    df = df,
    col_name = "company",
    entity_type = "business", 

The append_codes() method performs entity resolution on the specified column of a data frame and returns the original data frame with the column_name_id column appended as the right most column.

It takes three arguments: df, col_name, and entity_type.

Parameter Type Description
df object Data frame object
col_name string Column name of entities to be resolved
entity_type string Type of entity; Currently supported entities: ["business", "people", "address"]

Example Response

company company_id
thinkdata 3e39bd2406
thinkdata inc 3e39bd2406

Additional (Optional) Parameters

Parameter Type Description
datasetId string Namara dataset id; if not given, assign auto-generated id
threshold float Threshold for ER operations [0.0, 1.0]
use_master_db boolean Set to False to bypass fetching entities from master db; default to True


Code Error Description
200 OK
401 Unauthorized The API_KEY associated with this request is not valid
422 Invalid Entry The entity is not processable by Namara ER
500 Internal Something went wrong inside Namara ER

Get In Touch

We're easy to reach – visit our contact page and be in touch with us if you:

We would love to hear from you!