Synthetic Dataset Generation

KerusCloud can be used to generate highly realistic synthetic datasets for use in a wide variety of analytics applications in the life sciences sector and beyond. Synthetic datasets provide a realistic alternative, describing the characteristics of subject-level data without revealing protected information. This allows them to be shared and used freely without raising privacy concerns.

Try It Out

Learning Library

Synthetic Dataset Generation

KerusCloud can be used to generate highly realistic synthetic datasets for use in a wide variety of analytics applications in the life sciences sector and beyond. Synthetic datasets provide a realistic alternative, describing the characteristics of subject-level data without revealing protected information. This allows them to be shared and used freely without raising privacy concerns.

Try It Out

Learning Library

What is synthetic data?

Synthetic data is data which has been generated using purpose-built computer simulations, mathematical/statistical models or algorithms. Synthetic data is generated to meet specific needs or certain conditions that may not be found in the original, real data. It has many applications across multiple industries including:

Market research and business intelligence
Testing and validating software products and systems
Buiding and testing algorithms
Predictive modelling, machine learning and AI

When is it useful for clinical trials?

Synthetic data is useful in clinical research, where it can be used:

In clinical trial design optimization to maximize chance of success.
To create external control arms for clinical trials to save time and resources.
In anonymization to enable the sharing of regulated or sensitive data.
To create large, auto labelled data for predictive modelling, machine learning and AI to address issues of imbalanced data.

How do we create it?

Within KerusCloud is a synthetic data generator. It can handle diverse and complex data collected from disparate data sources and produce synthetic datasets from them. KerusCloud’s exceptional modelling capability allows it to incorporate realistic characteristics into the synthetic datasets it produces such as missing data, truncation and censoring. It can model the inter-correlation between subject-level data such as subgroups and strata, risk factors/covariates and multiple outcomes and data types. This delivers a highly realistic synthetic version of the original data.

Common questions which can be answered using synthetic data include:

I would like to share sensitive patient level data across the organisation, how can I do this in an anonymised way without losing vital information from the dataset?

I am considering utilising a synthetic control arm in my clinical trial, how could I utilise the sparse data I have from historical trials?

I need to train a machine learning algorithm with sensitive subject-level data, can I use a synthetic dataset instead?

Interested in unlocking the full potential of your sensitive data? Book a meeting with our Data Science team to discover how we can support this.

Book a consultation

KerusCloud SDG can be used to support many applications including:

Market research and business intelligence
Testing and validating software products and systems
Building and testing algorithms
Predictive modelling, machine learning and AI

Discover more about Data Science at Exploristics

Synthetic Dataset Generation