Quick Loader¶
Use Cases¶
The Quick Loader is used to populate your database with data for the first time or for periodically updating data. Currently, the only supported database is PostgreSQL. There are three source system options for loading data from CSV files: - Files located in the local file system, - Files located in an Amazon S3 bucket, and - Files located in an Azure BLOB storage container.
Getting Started¶
Below are the steps to running quickloader for the first time.
Run PostgreSQL¶
This can be accomplished by either downloading and installing PostgreSQL to your machine or running it in a docker container.
Get Data Ready¶
Quick Loader will parse any CSV files in its source location named in this format: table-<yourTableNameHere>[-optionalSuffix].csv. For example, table-ctx_business_unit.csv or table-ctx_business_unit-20240131.csv. Quick Loader also supports mapping rows of CSV data into multiple tables
using an entity mapping. Files that contain data associated with an entity map are named in this format: entity-<yourEntityNameHere>[-optionalSuffix].csv. For example, entity-item.csv or entity-item-20240131.csv
Command Line Interface¶
The Quick Loader Command Line interface is available through a Java executable jar file.
Command Line Arguments for Running Quick Loader¶
sourceType: The type of the source system. Current supported source systems are: local file system, Amazon S3 Bucket, Azure BLOB storage Example:-sourceType FILE_SYSTEMstagingLocation: The location where Quick Loader will put artifacts after parsing data. If omitted, Quick Loader will create a location injava.io.tmpdir. Example:-stagingLocation "/Users/__Your User__/tmp/quickloader/quickloader-stage"sourceLocation: The location where Quick Loader will look for csv files to process. If your source system is an Amazon S3 bucket, this should be set to the S3 bucket name. If your source system is Microsoft Azure BLOB storage, this should be set to the storage container name. Sub folders within a bucket/container are permitted. Example:-sourceLocation "/Users/__Your User__/tmp/quickloader/quickloader-source"archiveLocation: The location where Quick Loader will move processed files to. If your source system is an Amazon S3 bucket, this should be set to the S3 bucket name that you want to archive files to. If your source system is Microsoft Azure BLOB storage, this should be set to the storage container name of the container you want to archive files to. Sub folders within a bucket/container are supported. Example:-archiveLocation "/Users/__Your User__/tmp/quickloader/archive"failedFilesLocation: The location where Quick Loader will move input files which fail to be read. If your source system is an Amazon S3 bucket, this should be set to the S3 bucket name that you want to move files which fail to read. If your source system is Microsoft Azure BLOB storage, this should be set to the storage container name that you want to move files which fail to read. Sub folders within a bucket/container are supported. Example:-failedFilesLocation "/Users/__Your User__/tmp/quickloader/failed"driver: The database driver Quick Loader will use. Currently only PostgreSQL is supported. Example:-driver org.postgresql.Driverurl: The JDBC connection URL used for connecting to the database. Example:-url jdbc:postgresql://localhost:5432/jmc?sslmode=disableuser: Username in database for login. Example:-user postgrespassword: Password in database for login. Example:-password admin
Execution¶
The following are examples of how to execute the Quick Loader from the command line. The name of the Quick Loader command line jar file may have a version number in it. For example, if the version is 24.3.0-SNAPSHOT, name of the jar file will be quickloader-cli-24.3.0-SNAPSHOT.jar. The examples below show what the commands would be for a Quick Loader jar file with that name.
Assumptions:
- You have Java 17 or greater installed on the computer you run the following examples from.
- Java is properly installed such that when you type java -version at a command line prompt, you receive output which shows the correct java version.
- You have some basic knowledge of how Spring Application Properties work
Example: Show usage¶
Assuming the quickloader-cli jar file is located in the current directory, the command to show usage is:
> ./quickloader-cli-24.3.0-SNAPSHOT.jar
Output:
Missing required options: s, url, user, password
usage: quickloader
-a,--archiveLocation <arg> If omitted, archiving of processed files
is bypassed. Otherwise, If sourceType is
FILE_SYSTEM, this is expected to be the
directory in in which the source CSV
files should be archived to. If
sourceType is AMAZON_S3, this is the name
of the Amazon S3 bucket where the source
CSV files will be archived. If sourceType
is AZURE_BLOB, this is the name of the
Azure storage container where the CSV
files will be archived.
-driver,--db_driver <arg> Database Driver
-f,--failedFilesLocation <arg> If omitted, files in the source location
which fail to be read will remain in the
source location. Otherwise, If
sourceType is FILE_SYSTEM, this is
expected to be the directory in which
the source CSV files should be moved to
if the file fails to be read. If
sourceType is AMAZON_S3, this is the
name of the Amazon S3 bucket where the
source CSV file will be moved to if the
file fails to be read. If sourceType is
AZURE_BLOB, this is the name of the
Azure storage container where the source
CSV file will be moved to if the file
fails to be read.
-g,--stagingLocation <arg> The name of a directory where temporary
files created by the Quick Loader can be
written. If ommitted,
java.io.tmpdir/quickloader will be used.
-password,--db_password <arg> Database Password
-s,--sourceLocation <arg> If sourceType is FILE_SYSTEM, this is
expected to be the directory in in which
the source CSV files are located. If
sourceType is AMAZON_S3, this is the name
of the Amazon S3 bucket where the source
CSV files are located. If sourceType is
AZURE_BLOB, this is the name of the
storage container where the source CSV
files are located.
-t,--sourceType <arg> Source type. Supported values:
FILE_SYSTEM | AMAZON_S3 | AZURE_BLOB
-url,--db_url <arg> Database URL
-user,--db_user <arg> Database User
Example: Load CSV Files from Local File System with Local PostgreSQL Database¶
> ./quickloader-cli-24.3.0-SNAPSHOT.jar -sourceType FILE_SYSTEM -sourceLocation quickloader-dev -driver org.postgresql.Driver -url jdbc:postgresql://localhost:5432/jmcdb -user admin -password admin
Example: Load CSV Files from Amazon S3 Bucket with Local PostgreSQL Database¶
> ./quickloader-cli-24.3.0-SNAPSHOT.jar -sourceType AMAZON_S3 -sourceLocation my-aws-bucket -driver org.postgresql.Driver -url jdbc:postgresql://localhost:5432/jmcdb -user admin -password admin
Example: Load CSV Files from Amazon S3 Bucket (located in a sub folder) with Local PostgreSQL Database¶
> ./quickloader-cli-24.3.0-SNAPSHOT.jar -sourceType AMAZON_S3 -sourceLocation my-aws-bucket/my-subfolder -driver org.postgresql.Driver -url jdbc:postgresql://localhost:5432/jmcdb -user admin -password admin
Example: Load CSV Files from Microsoft Azure BLOB storage container with Local PostgreSQL Database¶
> ./quickloader-cli-24.3.0-SNAPSHOT.jar -sourceType AZURE_BLOB -sourceLocation my-azure-blob -driver org.postgresql.Driver -url jdbc:postgresql://localhost:5432/jmcdb -user admin -password admin
Example: Load CSV Files from Microsoft Azure BLOB storage container (located in a sub folder) with Local PostgreSQL Database¶
> ./quickloader-cli-24.3.0-SNAPSHOT.jar -sourceType AZURE_BLOB -sourceLocation my-azure-blob/my-subfolder -driver org.postgresql.Driver -url jdbc:postgresql://localhost:5432/jmcdb -user admin -password admin
Example: Override the Default Quick Loader configuration¶
The default Quick Loader configuration is found in the quickloader-cli jar file at: application-quickloader.yml:
application-quickloader.yml
---
quickLoader:
entityMappings:
user:
name: user
documentation: 'Mapping of a user of the system assigned to a particular store.'
example: 'entity,user
USERNAME,LAST_NAME,FIRST_NAME,NICKNAME,LAST_LOGIN,LOCKED_OUT_FLAG,PASSWORD_EXPIRED_FLAG,PASSWORD_FAILED_ATTEMPTS,LAST_PASSWORD_ATTEMPT,LOCALE,WORKGROUP_ID,USER_ACTIVE_FLAG,BIOMETRIC_DATA,WORKGROUP_1,WORKGROUP_2,USER_LOCATION,HASHED_PASSWORD
"5555","Doe","Cookie","",,,,,,"en_US","1","1",,2,,05243,0ffe1abd1a08215353c233d6e009613e95eec4253832a761af28ff37ac5a150c'
mappings:
- table: usr_user
- table: usr_password_history
columnMappings:
password_sequence: 1
hashed_password: '0ffe1abd1a08215353c233d6e009613e95eec4253832a761af28ff37ac5a150c'
- table: usr_user_location
columnMappings:
business_unit_id: ${USER_LOCATION}
primary_location_flag: 1
- table: usr_user_workgroup
item:
name: item
mappings:
- table: itm_item
- table: itm_selling_rule
columnMappings:
minimum_seller_age: 0
minimum_buyer_age: 0
- table: itm_assigned_item
- table: itm_product
- table: itm_item_image
columnMappings:
IMAGE_ID: '${ITEM_ID}_1'
LINE_SEQUENCE_NUMBER: '1'
ENABLED: '1'
- table: itm_item_group_member
columnMappings:
ITEM_GROUP_ID: '${ITEM_GROUP_ID_1}'
# - table: itm_item_group_member
# columnMappings:
# ITEM_GROUP_ID: '${ITEM_GROUP_ID_2}'
# - table: itm_item_group_member
# ITEM_GROUP_ID: '${ITEM_GROUP_ID_3}'
# - table: itm_item_group_member
# columnMappings:
# ITEM_GROUP_ID: '${ITEM_GROUP_ID_4}'
item entity configuration. Whenever you need to change the configuration for an entity,
you need to override the entire section for the entity. In this case, we need to override the entire
quickloader.entityMappings.item section in another file in your local file system. You can name the file anything that matches this pattern:
application-<yourChoiceOfSuffix>.yml, where "yourChoiceOfSuffix" can be any string except 'quickloader'. For this example, let's call it application-ql-overrides.yml.
application-ql-overrides.yml
---
quickLoader:
entityMappings:
item:
name: item
mappings:
- table: itm_item
- table: itm_selling_rule
columnMappings:
minimum_seller_age: 0
minimum_buyer_age: 0
raincheck_eligible: '0' # I added this value
- table: itm_assigned_item
- table: itm_product
- table: itm_item_image
columnMappings:
IMAGE_ID: '${ITEM_ID}' # I modified this value, removed the '_1' suffix
LINE_SEQUENCE_NUMBER: '1'
ENABLED: '1'
- table: itm_item_group_member
columnMappings:
ITEM_GROUP_ID: '${ITEM_GROUP_ID_1}'
quickloader.entityMappings.item
while the rest of the base Quick Loader configuration defined in the bundled application-quickloader.yml will remain unchanged.
To run using this configuration from the command line, we need to change slightly how we execute the command:
NOTE: The application-ql-overrides.yml needs to be located in the directory that you execute the command from.
> java -Dspring.profiles.active=ql-overrides -jar ./quickloader-cli-24.3.0-SNAPSHOT.jar -sourceType FILE_SYSTEM -sourceLocation quickloader-dev -driver org.postgresql.Driver -url jdbc:postgresql://localhost:5432/jmcdb -user admin -password admin
Notice that:
1. We used the java command to launch using the Java Virtual Machine installed on the device instead of just directly executing the quickloader-cli-24.3.0-SNAPSHOT.jar.
2. We added the -Dspring.profiles.active=ql-overrides argument so our application-ql-overrides.yml file will be loaded.
3. The first argument after #2 above, is -jar ./quickloader-cli-24.3.0-SNAPSHOT.jar, which tells java to execute the Quick Loader code found in that jar.
4. The arguments that follow are all Quick Loader arguments needed to be passed to the Quick Loader itself.
Example: Using Microsoft Azure BLOB Storage and How to Authenticate via a Configuration File¶
Using Microsoft Azure BLOB storage as your source system for command line usage requires providing additional configuration for authentication with Azure. Similar to the prior example, Override the Default Quick Loader configuration, you will need to execute Quick Loader using java in order to pass in Azure authentication properties.
First you will need to create a configuration file to hold your Azure storage account properties. You can name the file anything that matches this pattern:
application-<yourChoiceOfSuffix>.yml, where "yourChoiceOfSuffix" can be any string except 'quickloader'. For this example, let's call it application-azure.yml.
application-azure.yml
---
spring:
cloud:
azure:
storage:
blob:
account-name: <your Azure Storage Account Name> # e.g., retailer-quickloader
endpoint: <your Azure Storage Account endpoint url> # e.g., https://retailer-quickloader.blob.core.windows.net/
account-key: <your Azure Storage Account key> # e.g., JEEPEl+l2KtGit...==
NOTES:
- account-name and endpoint are required configuration properties.
- The account-key value can be obtained through the Microsoft Azure portal. Using this value for authentication may not be approved by your corporate Security group. See Microsoft's Azure Storage Blob configuration properties page
for alternative properties that can be used for authentication.
- The application-azure.yml needs to be located in the directory that you execute the following command from.
> java -Dspring.profiles.active=azure -jar ./quickloader-cli-24.3.0-SNAPSHOT.jar -sourceType AZURE_BLOB -sourceLocation quickloader-dev -driver org.postgresql.Driver -url jdbc:postgresql://localhost:5432/jmcdb -user admin -password admin
Notice that:
1. We used the java command to launch using the Java Virtual Machine installed on the device instead of just directly executing the quickloader-cli-24.3.0-SNAPSHOT.jar.
2. We added the -Dspring.profiles.active=azure argument so our application-azure.yml file will be loaded.
3. The first argument after #2 above, is -jar ./quickloader-cli-24.3.0-SNAPSHOT.jar, which tells java to execute the Quick Loader code found in that jar.
4. The arguments that follow are all Quick Loader arguments needed to be passed to the Quick Loader itself.
The Quick Loader Service¶
Quick Loader REST web services are available, allowing you to invoke the Quick Loader from within a Commerce container. There are two endpoints:
- [POST]
rest/quickloader/run: Invokes a single execution of the Quick Loader - [GET]
rest/quickloader/status: Reports status of the current and/or previous Quick Loader execution
run Endpoint¶
Expects a RunQuickLoaderRequest JSON payload with the following structure:
{
// Required params
"sourceType": "AMAZON_S3", // See Command Line parameter with the same name
"sourceLocation": "my_s3_bucket_name", // See Command Line parameter with the same name
// Optional params
"stagingLocation": "", // See Command Line parameter with the same name
"archiveLocation": "my_s3_archive_bucket_name", // See Command Line parameter with the same name
"failedFilesLocation": "failed", // See Command Line parameter with the same name
"runAsync": "true", // If false, will not return a response until execution is complete
"suppressStatLoggingOnNoData": "false" // If true, will not log any statistics when no files were found to load
}
The RunQuickLoaderResponse response will also be in JSON format with this structure:
{
"resultCode": "RUNNING", // STARTED, RUNNING, ERROR, SUCCESS
"message": "", // Optional message associated with the result
"stats": // QuickLoaderStats accumulated during the execution
{
"startTime": 1708792344177,
"filesToLoadCount": 1,
"parsedFileCount": 1,
"loadedRowCount": 4,
"failedRowCount": 0,
"averagePerRowMs": 257.25,
"totalTimeMs": 1029,
"loadedRowCountsByTable": {
"ctx_phone": 4
},
"errors": {}
}
}
status Endpoint¶
This endpoint will return a JSON response with the following structure.
Example of status returned when Quick Loader is currently running:
{
"currentlyRunning": true, // Is the Quick Loader running right now?
"stats": {
"startTime": 1708863523859, // Realtime statistics
"filesToLoadCount": 1,
"parsedFileCount": 1,
"loadedRowCount": 792,
"failedRowCount": 1,
"averagePerRowMs": 0,
"totalTimeMs": 0,
"loadedRowCountsByTable": {
"itm_selling_price": 792
},
"errors": { // Errors encountered thus far. Up to 5 errors logged per entry.
"itm_selling_price": [
{
"index": null,
"error": "Can't load row: {selling_price_id=10118051011805101180510118051011805101180510118051011805101180510118051011805101180510118051011805101180510118051011805101180510118051011805101180510118051011805, effective_start_time=2023-12-04T00:00:01, item_id=1011805, product_id=1011805, price=7.65, list_price=7.65, cost=, quantity=1, effective_end_time=, tag_country=*, tag_state=*, tag_business_unit_id=6501, tag_store_type=*, create_time=2024-02-25T07:18:43.854, create_by=quickloader, last_update_time=2024-02-25T07:18:43.854, last_update_by=quickloader}"
}
]
}
},
"previousStatus": null,
"previousMessage": null,
"previousStats": null
}
Here the stats from the previous execution are available, until the server is restarted.
{
"currentlyRunning": false,
"stats": null,
"previousStatus": "ERROR", // Result status of last execution
"previousMessage": "1 rows failed to load.", // Any associated message summarizing the result
"previousStats": { // Final stats of the previous execution
"startTime": 1708863523859,
"filesToLoadCount": 1,
"parsedFileCount": 1,
"loadedRowCount": 404007,
"failedRowCount": 1,
"averagePerRowMs": 0.0910107993178336,
"totalTimeMs": 36769,
"loadedRowCountsByTable": {
"itm_selling_price": 404007
},
"errors": { // Errors accumulated on the last run. Up to 5 errors logged per entry.
"itm_selling_price": [
{
"index": null,
"error": "Can't load row: {selling_price_id=10118051011805101180510118051011805101180510118051011805101180510118051011805101180510118051011805101180510118051011805101180510118051011805101180510118051011805, effective_start_time=2023-12-04T00:00:01, item_id=1011805, product_id=1011805, price=7.65, list_price=7.65, cost=, quantity=1, effective_end_time=, tag_country=*, tag_state=*, tag_business_unit_id=6501, tag_store_type=*, create_time=2024-02-25T07:18:43.854, create_by=quickloader, last_update_time=2024-02-25T07:18:43.854, last_update_by=quickloader}"
}
]
}
}
}
Running Quick Loader as a Job or Jobs¶
The Quick Loader server is configured to run one or more Quick Loader jobs in a background thread using the commerce Job execution
framework. Each job periodically invokes the Quick Loader run REST endpoint, which, in turn, executes the Quick Loader
to check a configured source location for CSV files to load. It is possible to check multiple different source types using
multiple Quick Loader jobs. When one or more new CSV files exist in a source location, the Quick Loader job detects them and loads them.
The out-of-the-box configuration includes one job with configuration as follows. The job configuration can be easily overridden
to suit your specific needs.
openpos:
jobs:
enabled: true # Enables or disables running of all jobs. If true, running of jobs is enabled.
# If false, no jobs will run. Default value is true."
schedule:
quickLoader:
beanName: quickLoaderJob
enabled: true
type: periodic # Execute the job on a fixed period schedule, every X milliseconds
# as given by expression.
initialDelay: 5000 # How many milliseconds to wait after the server is started before
# executing the first time.
expression: 60000 # How long to wait between executions.
clusterEnabled: true # For clustered commerce installs, ensures only one instance of
# Commerce can execute the job at a time.
config:
quickLoader:
sourceType: FILE_SYSTEM # The source of where to look for files to consume. Currently
# supported values are FILE_SYSTEM, AMAZON_S3.
sourceLocation: "./work" # If sourceType is FILE_SYSTEM, this is the path to the directory
# to scan for files to process. If sourceType is AMAZON_S3, this
# is the name of the S3 bucket to scan for files.
stagingLocation: # This is a directory on the host running the QuickLoader where
# temporary files are created during input file processing. If no
# value is set here, the stagingLocation will default to
# java.io.tmpdir/quickloader
DOC__archiveLocation: "If omitted, each source CSV file will remain in the sourceLocation after processing. If sourceType is FILE_SYSTEM, this is the path to the directory to move processed files to.
If sourceType is AMAZON_S3, this is the name of an S3 bucket to move the file to."
archiveLocation: "./archive" # If omitted, each source CSV file will remain in the sourceLocation
# after processing. If sourceType is FILE_SYSTEM, this is the path to
# the directory to move processed files to. If sourceType is AMAZON_S3,
# this is the name of an S3 bucket to move the file to.
DOC__failedFilesLocation: "If omitted, files in the source location which fail to be read will remain in the source location. Otherwise, If sourceType is FILE_SYSTEM, this is expected to be the directory
in which the source CSV files should be moved to if the file fails to be read. If sourceType is AMAZON_S3, this is the name of the Amazon S3 bucket where the source CSV file will be moved to if the file fails to be read.
If sourceType is AZURE_BLOB, this is the name of the Azure storage container where the source CSV file will be moved to if the file fails to be read."
failedFilesLocation: "./failed"
Persistent Logging¶
The Quick Loader Job will log results to the ctx_job_log database table. Entries are logged to the database only if the Quick Loader processed one or more files on a given execution. The full response from the Quick Loader run REST endpoint is stored in the payload column as JSON.
Example: Yaml Configuration for Running a Single Quick Loader Job with Azure¶
If you only have one BLOB storage container to read CSV files from, you can simply override the base quickLoader job configuration as follows.
openpos:
jobs:
config:
quickLoader:
sourceLocation: quickloader-dev # Azure source BLOB container name
sourceType: AZURE_BLOB
archiveLocation: quickloader-archive # Azure archive BLOB container name
failedFilesLocation: quickloader-failed # Azure failed files BLOB container name
quickloader-dev BLOB container. Files can be read
from the root of a container for from a container sub folder. Once Quick Loader has processed a file in source BLOB storage container, the file will be moved to the BLOB storage container
specified by archiveLocation, which is quickloader-archive in this example. If any files fail to be read, they will be moved to the failedFilesLocatoin, which can be a container or sub
folder in a container.
Example: Yaml Configuration for Quick Loader Job with Azure and CSV Files Located in Container Sub Folders¶
If you only have one BLOB storage container to read CSV files from and archive files to, you can simply override the base quickLoader job configuration as follows.
openpos:
jobs:
config:
quickLoader:
sourceLocation: quickloader-dev/work # Azure source BLOB container name with 'work' sub folder
sourceType: AZURE_BLOB
archiveLocation: quickloader-dev/archive # Azure archive BLOB container is same the same as the source container with 'archive' sub folder
failedFilesLocation: quickloader-dev/failed # Azure failed files BLOB container is the same as the source container with 'failed' sub folder
Example: Yaml Configuration for Quick Loader Job with AWS and CSV Files Located in Bucket Sub Folders¶
If you only have one AWS storage container to read CSV files from and archive files to, you can simply override the base quickLoader job configuration as follows.
openpos:
jobs:
config:
quickLoader:
sourceLocation: quickloader-dev/work # AWS source bucket name with 'work' sub folder
sourceType: AMAZON_S3
archiveLocation: quickloader-dev/archive # AWS archive bucket is the same as the source container with 'archive' sub folder
failedFilesLocation: quickloader-dev/failed # Azure failed bucket is the same as the source container with 'failed' sub folder
Example: Yaml Configuration for Running Multiple Quick Loader jobs¶
The Quick Loader server supports multiple jobs of a given sourceType that each check different source locations. For example, if you have multiple Amazon S3 buckets that you want to monitor, you can set up a seperate Quick Loader job to monitor each bucket.
openpos:
jobs:
config:
quickLoader1:
sourceLocation: quickloader-dev # S3 source bucket name
sourceType: AMAZON_S3
archiveLocation: quickloader-archive # S3 archive bucket name
failedFilesLocation: quickloader-failed # S3 failed files bucket name
quickLoader2:
sourceLocation: quickloader-dev # Azure source BLOB container name
sourceType: AZURE_BLOB
archiveLocation: quickloader-archive # Azure archive BLOB container name
failedFilesLocation: quickloader-failed # Azure failed files BLOB container name
schedule:
# Disable the base quickLoader Job, which uses the local file system
quickLoader:
enabled: false
# Set up schedule for running S3-based Quick Loader Job
quickLoader1:
beanName: quickLoaderJob
enabled: true
type: periodic
initialDelay: 5000
expression: 60000
clusterEnabled: true
# Set up schedule for running Azure-based Quick Loader Job
quickLoader2:
beanName: quickLoaderJob
enabled: true
type: periodic
initialDelay: 15000
expression: 60000
clusterEnabled: true
Example: Azure BLOB Storage Authentication Configuration for Quick Loader Jobs¶
---
openpos:
# ...
spring:
cloud:
azure:
storage:
blob:
account-name: <your Azure Storage Account Name> # e.g., retailer-quickloader
endpoint: <your Azure Storage Account endpoint url> # e.g., https://retailer-quickloader.blob.core.windows.net/
account-key: <your Azure Storage Account key> # e.g., JEEPEl+l2KtGit...==
account-name and endpoint are required configuration properties.
- The account-key value can be obtained through the Microsoft Azure portal. Using this value for authentication may not be approved by your corporate Security group. See Microsoft's Azure Storage Blob configuration properties page
for alternative properties that can be used for authentication.