Ask a Question

Live import

You can import data on a running Dgraph instance (which may have prior data) using Dgraph CLI command dgraph live referred to as Live Loader. Live Loader sends mutations to a Dgraph cluster and has options to handle unique IDs assignment and to update existing data.

Note Live Loader accepts RDF N-Quad/Triple data or JSON in plain or gzipped format. Refers to data migration to see how to convert other data formats.

Before you begin

Verify that you have a local folder <local-path-to-data> containing

  • at least one data file in RDF or JSON in plain or gzip format with the data to import
  • an optional schema file.

Those files have been generated by an export or by a data migration tool.

Importing data on Dgraph Cloud

  1. Obtain dgraph binary or the latest docker image by following the installation instructions. This is required to run Dgraph CLI command dgraph live.

  2. Obtain the GRPC endpoint of you Dgraph Cloud backend and a valid Client API key.

    An administrator gets those information with the following steps:

    1. Log into the Dgraph Cloud account, select the backend
    2. In the Admin section of the Dgraph Cloud console, go to Settings and copy the value of the gRPC Endpoint from the General tab.
    3. Access the API Keys tab to generate an Client API Key.
Note The gRPC endpoint is different from the GraphQL endpoint that you can find in the section Overview. The gRPC endpoint looks like frozen-mango.grpc.us-west-1.aws.cloud.dgraph.io:443
  1. Run the live loader as follows:
docker run -it --rm -v <local-path-to-data>:/tmp dgraph/dgraph:latest \
  dgraph live --slash_grpc_endpoint <grpc-endpoint> -f /tmp/<data-file> -s /tmp/<schema-file> -t <api-key>

Load multiple data files by using

docker run -it --rm -v <local-path-to-data>:/tmp dgraph/dgraph:latest \
  dgraph live --slash_grpc_endpoint <grpc-endpoint> -f /tmp -s /tmp/<schema-file> -t <api-key>

When the path provided with -f, --files option is a directory, then all files ending in .rdf, .rdf.gz, .json, and .json.gz will be loaded. Be sure that your schema file has another extension (.txt or .schema for example).

dgraph live --slash_grpc_endpoint <grpc-endpoint> -f <local-path-to-data>/<data-file> -s <local-path-to-data>/<schema-file> -t <api-key>

Load multiple data files by using

dgraph live --slash_grpc_endpoint <grpc-endpoint> -f /tmp -s /tmp/<schema-file> -t <api-key>

When the path provided with -f, --files option is a directory, then all files ending in .rdf, .rdf.gz, .json, and .json.gz will be loaded. Be sure that your schema file has another extension (.txt or .schema for example).

Batch upserts

You can use Live loader to update existing data, either to modify existing predicates are to add new predicates to existing nodes.

To do so, use the -U, --upsertPredicate flag or the -x, --xidmap flag.

upsertPredicate flag

Use the -U, --upsertPredicate flag to specify the predicate name in your data that will serve as unique identifier.

For example:

dgraph live --files <directory-with-data-files> --schema <path-to-schema-file> --upsertPredicate xid

The upsert predicate used must be present the Dgraph instance or in the schema file and must be indexed.
For each node, Live loader will use the node name provided in the data file as the upsert predicate value.
For example if your data file contains

<_:my.org/customer/1>       <firstName>  "John"     .

The previous command creates or updates the node with predicate xid equal to my.org/customer/1 and will set it’s predicate firstName with the value John.

xidmap flag

dgraph live --files <directory-with-data-files> --schema <path-to-schema-file> --xidmap <local-directory>

Live loader uses -x, --xidmap directory to lookup the uid value for each node name used in the data file or to store the mapping between the node names and the generated uid for every new node.

Import data on Dgraph self-hosted

Run the live loader using the the -a, --alpha flag as follows

docker run -it --rm -v <local-path-to-data>:/tmp dgraph/dgraph:latest \
  dgraph live --alpha <Dgraph Alpha gRPC endpoint> -f /tmp/<data-file> -s /tmp/<schema-file>

Load multiple data files by using

docker run -it --rm -v <local-path-to-data>:/tmp dgraph/dgraph:latest \
  dgraph live --alpha <Dgraph Alpha gRPC endpoint> -f /tmp -s /tmp/<schema-file>

--alpha default value is localhost:9080. You can specify a comma separated list of alphas addresses in the same cluster to distribute the load.

When the path provided with -f, --files option is a directory, then all files ending in .rdf, .rdf.gz, .json, and .json.gz will be loaded. Be sure that your schema file has another extension (.txt or .schema for example).

  dgraph live --alpha <grpc-endpoints> -f <local-path-to-data>/<data-file> -s <local-path-to-data>/<schema-file>

--alpha default value is localhost:9080. You can specify a comma separated list of alphas addresses in the same cluster to distribute the load.

Load from S3

To live load from Amazon S3 (Simple Storage Service), you must have either permissions to access the S3 bucket from the system performing live load (see IAM setup below) or explicitly add the following AWS credentials set via environment variables:

Environment Variable Description
AWS_ACCESS_KEY_ID or AWS_ACCESS_KEY AWS access key with permissions to write to the destination bucket.
AWS_SECRET_ACCESS_KEY or AWS_SECRET_KEY AWS access key with permissions to write to the destination bucket.

IAM setup

In AWS, you can accomplish this by doing the following:

  1. Create an IAM Role with an IAM Policy that grants access to the S3 bucket.
  2. Depending on whether you want to grant access to an EC2 instance, or to a pod running on EKS, you can do one of these options:

Once your setup is ready, you can execute the live load from S3. As examples:

## short form of S3 URL
dgraph live \
  --files s3:///<bucket-name>/<directory-with-data-files> \
  --schema s3:///<bucket-name>/<directory-with-data-files>/schema.txt

## long form of S3 URL
dgraph live \
  --files s3://s3.<region>.amazonaws.com/<bucket>/<directory-with-data-files> \
  --schema s3://s3.<region>.amazonaws.com/<bucket>/<directory-with-data-files>/schema.txt
Note The short form of the S3 URL requires S3 URL is prefixed with s3:/// (noticed the triple-slash ///). The long form for S3 buckets requires a double slash, e.g. s3://.

Load from MinIO

To live load from MinIO, you must have the following MinIO credentials set via environment variables:

Environment Variable Description
MINIO_ACCESS_KEY Minio access key with permissions to write to the destination bucket.
MINIO_SECRET_KEY Minio secret key with permissions to write to the destination bucket.

Once your setup is ready, you can execute the bulk load from MinIO:

dgraph live \
  --files minio://minio-server:port/<bucket-name>/<directory-with-data-files> \
  --schema minio://minio-server:port/<bucket-name>/<directory-with-data-files>/schema.txt

Enterprise Features

Multi-tenancy (Enterprise Feature)

Since multi-tenancy requires ACL, when using the Live loader you must provide the login credentials using the --creds flag. By default, Live loader loads the data into the user’s namespace.

Guardians of the Galaxy can load the data into multiple namespaces. Using --force-namespace, a Guardian can load the data into the namespace specified in the data and schema files.

Note The Live loader requires that the namespace from the data and schema files exist before loading the data.

For example, to preserve the namespace while loading data first you need to create the namespace(s) and then run the live loader command:

dgraph live \
  --schema /tmp/data/1million.schema \
  --files /tmp/data/1million.rdf.gz --creds="user=groot;password=password;namespace=0" \
  --force-namespace -1

A Guardian of the Galaxy can also load data into a specific namespace. For example, to force the data loading into namespace 123:

dgraph live \
  --schema /tmp/data/1million.schema \
  --files /tmp/data/1million.rdf.gz \
  --creds="user=groot;password=password;namespace=0" \
  --force-namespace 123
Note The Live loader requires that the namespace from the data and schema files exist before loading the data.

Encrypted imports (Enterprise Feature)

A new flag --encryption key-file=value is added to the Live Loader. This option is required to decrypt the encrypted export data and schema files. Once the export files are decrypted, the Live Loader streams the data to a live Alpha instance. Alternatively, starting with v20.07.0, the vault_* options can be used to decrypt the encrypted export and schema files.

Note If the live Alpha instance has encryption turned on, the p directory will be encrypted. Otherwise, the p directory is unencrypted.

For example, to load an encrypted RDF/JSON file and schema via Live Loader:

dgraph live \
 --files <path-containering-encrypted-data-files> \
 --schema <path-to-encrypted-schema> \
 --encryption key-file=<path-to-keyfile-to-decrypt-files>

You can import your encrypted data into a new Dgraph Alpha node without encryption enabled.

# Encryption Key from the file path
dgraph live --files "<path-to-gzipped-RDF-or-JSON-file>" --schema "<path-to-schema>"  \
  --alpha "<dgraph-alpha-address:grpc_port>" --zero "<dgraph-zero-address:grpc_port>" \
  --encryption key-file="<path-to-enc_key_file>"

# Encryption Key from HashiCorp Vault
dgraph live --files "<path-to-gzipped-RDF-or-JSON-file>" --schema "<path-to-schema>"  \
  --alpha "<dgraph-alpha-address:grpc_port>" --zero "<dgraph-zero-address:grpc_port>" \
  --vault addr="http://localhost:8200";enc-field="enc_key";enc-format="raw";path="secret/data/dgraph/alpha";role-id-file="./role_id";secret-id-file="./secret_id"

Other Live Loader options

--new_uids (default: false): Assign new UIDs instead of using the existing UIDs in data files. This is useful to avoid overriding the data in a DB already in operation.

--format: Specify file format (rdf or json) instead of getting it from filenames. This is useful if you need to define a strict format manually.

-b, --batch (default: 1000): Number of N-Quads to send as part of a mutation.

-c, --conc (default: 10): Number of concurrent requests to make to Dgraph. Do not confuse with -C.

-C, --use_compression (default: false): Enable compression for connections to and from the Alpha server.

--vault superflag’s options specify the Vault server address, role id, secret id, and field that contains the encryption key required to decrypt the encrypted export.