Dashboard displays options for data onboarding tasks.
User selects the desired target database from a pre-configured list (Ex: MYSQL, PostgreSQL, Google BigQuery)
Depending on chosen database, the UI dynamically adapts to display relevant data fields specific to database schema
Two data input options available: Fill up a form or File Upload.
Data Preprocessing
Upon user selection of the target database and data input method, the data is sent securely to the chosen LLM API.
If form input is chosen in the previous step
LLM analyses each form field and attempts to identify the data type, example: text, number, date)
LLM extracts key information from text fields (e.g name parsing, identifying city etc.)
If form input is chosen in the previous step
LLM analyses the uploaded file structure (columns, datatypes) and attempts to identify the headers. (if present)
LLM performs basic data cleaning like removing leading/trailing spaces, converting text to lowercase.
Data Mapping
Based on LLM analysis, the UI displays a table with 2 sections.
User data: shows either the pre-populated form fields (with extracted data types) or the first few rows of the uploaded file with identified data types.
Target Database: Shows the pre-configured schema for the selected database, displaying each column name and data type.
LLM suggests potential mappings between user data fields and target database columns based on data type analysis and semantic understanding. These suggests appear as color-coded lines connecting user and database fields (e.g green for confident match, yellow for potential match needing confirmation).
User can review and confirm mappings (Drag -drop). LLM highlights any inconsistencies or conflicts (e.g trying to map a text field to a date column)
User has the option to manually adjust mappings if LLM suggestions are inaccurate.
Data Transformation
Based on the user-confirmed mappings, the transformation happen.
Transformation rules are pre-defined based on data types (e.g. converting dates to a specific format, standardising phone number formats etc.)
LLM can be used for complex transformations:
User can flag specific data points for missing values. LLM can suggest potential values based on existing data patterns (with user confirmations)
Suggest corrections for specific validation errors based on context.
Users receive clear error messages with details about any data validations issues encountered. Users can then choose to correct the data or skip specific entries
Database Integration
The tool connects to the target database based on pre-configured user credentials.
Transformed and validated data is uploaded in batches to the designated table within the user’s database.
Success or Error Notification
After data upload, the system performs a final validation check on the database side
User receives a notification based on the outcome:
Success: Confirmation message displays of successfully uploaded records. Users can access the data directly within their database.
Error: Notification details any errors encountered during the upload process. (eg: data format issues etc.) Users can then download an error report with specific details for correction and retry the upload.
Logging and Auditing
All user actions, data transformations, and error messages are logged within the application for audit purposes.
User can access detailed logs to track their data onboarding tasks and identify any recurring issues.
Additional features to think about later
Scheduled data onboarding
Customizable data transformation rules
LLM training for specific domains: (to know the jargons of the industry)