hddm.simulators package

Submodules

hddm.simulators.basic_simulator module

hddm.simulators.basic_simulator.simulator(**kwargs)

Basic data simulator for the models included in HDDM.

Arguments:
thetalist or numpy.array or panda.DataFrame

Parameters of the simulator. If 2d array, each row is treated as a ‘trial’ and the function runs n_sample * n_trials simulations.

model: str <default=’angle’>

Determines the model that will be simulated.

n_samples: int <default=1000>

Number of simulation runs (for each trial if supplied n_trials > 1)

delta_t: float

Size fo timesteps in simulator (conceptually measured in seconds)

max_t: float

Maximum reaction the simulator can reach

no_noise: bool <default=False>

Turn noise of (useful for plotting purposes mostly)

Return:

tuple can be (rts, responses, metadata) or (rt-response histogram, metadata) or (rts binned pointwise, responses, metadata)

hddm.simulators.hddm_dataset_generators module

hddm.simulators.hddm_dataset_generators.hddm_preprocess(simulator_data=None, subj_id='none', keep_negative_responses=False, add_model_parameters=False, keep_subj_idx=True)

Takes simulator data and turns it into HDDM ready format.

Arguments:
simulator_data: tuple

Output of e.g. the hddm.simulators.basic_simulator function.

subj_id: str <default=’none’>

Subject id to attach to returned dataset

keep_negative_responses: bool <default=False>

Whether or not to turn negative responses into 0

add_model_parameters: bool <default=False>

Whether or not to add trial by trial model parameters to returned dataset

keep_subj_idx: bool <default=True>

Whether to keep subject id in the returned dataset

hddm.simulators.hddm_dataset_generators.make_parameter_vectors_nn(model='angle', param_dict=None, n_parameter_vectors=10)

Generates a (number of) parameter vector(s) for a given model.

Arguments:
model: str <default=’angle’>

String that specifies the model to be simulated. Current options include, ‘angle’, ‘ornstein’, ‘levy’, ‘weibull’, ‘full_ddm’

param_dict: dict <default=None>

Dictionary of parameter values that you would like to pre-specify. The dictionary takes the form (for the simple examples of the ddm), {‘v’: [0], ‘a’: [1.5]} etc.. For a given key supply either a list of length 1, or a list of length equal to the n_parameter_vectors argument supplied.

n_parameter_vectors: int <default=10>

Nuber of parameter vectors you want to generate

Return: pandas.DataFrame

Columns are parameter names and rows fill the parameter values.

hddm.simulators.hddm_dataset_generators.simulator_h_c(data=None, n_subjects=10, n_trials_per_subject=100, model='ddm_hddm_base', conditions=None, depends_on=None, regression_models=None, regression_covariates=None, group_only_regressors=True, group_only=['z'], fixed_at_default=None, p_outlier=0.0, outlier_max_t=10.0, **kwargs)

Flexible simulator that allows specification of models very similar to the hddm model classes. Has two major modes. When data

is supplied the function generates synthetic versions of the provided data. If no data is provided, you can supply a varied of options to create complicated synthetic datasets from scratch.

Arguments:
data: pd.DataFrame <default=None>

Actual covariate dataset. If data is supplied its covariates are used instead of generated.

n_subjects: int <default=5>

Number of subjects in the datasets

n_trials_per_subject: int <default=500>

Number of trials for each subject

model: str <default = ‘ddm_hddm_base’>

Model to sample from. For traditional hddm supported models, append ‘_hddm_base’ to the model. Omitting ‘hddm_base’ imposes constraints on the parameter sets to not violate the trained parameter space of our LANs.

conditions: dict <default=None>

Keys represent condition relevant columns, and values are lists of unique items for each condition relevant column. Example: {“c_one”: [“high”, “low”], “c_two”: [“high”, “low”], “c_three”: [“high”, “medium”, “low”]}

depends_on: dict <default=None>

Keys specify model parameters that depend on the values –> lists of condition relevant columns. Follows the syntax in the HDDM model classes. Example: {“v”: [“c_one”, “c_two”]}

regression_models: list or strings <default=None>

Specify regression model formulas for one or more dependent parameters in a list. Follows syntax of HDDM model classes. Example: [“z ~ covariate_name”]

regression_covariates: dict <default={‘covariate_name’: {‘type’: ‘categorical’, ‘range’: (0, 4)}}>

Dictionary in dictionary. Specify the name of the covariate column as keys, and for each key supply the ‘type’ (categorical, continuous) and ‘range’ ((lower bound, upper bound)) of the covariate. Example: {“covariate_name”: {“type”: “categorical”, “range”: (0, 4)}}

group_only_regressors: bin <default=True>

Should regressors only be specified at the group level? If true then only intercepts are specified subject wise. Other covariates act globally.

group_only: list <default = [‘z’]>

List of parameters that are specified only at the group level.

fixed_at_default: list <default=None>

List of parameters for which defaults are to be used. These defaults are specified in the model_config dictionary, which you can access via: hddm.simulators.model_config. Example: [‘t’]

p_outlier: float <default = 0.0>

Specifies the proportion of outliers in the data.

outlier_max_t: float <default = 10.0>

Outliers are generated from np.random.uniform(low = 0, high = outlier_max_t) with random choices.

Returns:

The Dataframe holds the generated dataset, ready for constuction of an hddm model. The dictionary holds the groundtruth parameter (values) and parameter names (keys). Keys match

the names of traces when fitting the equivalent hddm model. The parameter dictionary is useful for some graphs, otherwise not neccessary.

Return type:

(pandas.DataFrame, dict)

hddm.simulators.hddm_dataset_generators.simulator_single_subject(parameters=(0, 0, 0), p_outlier=0.0, max_rt_outlier=10.0, model='angle', n_samples=1000, delta_t=0.001, max_t=20, bin_dim=None, bin_pointwise=False, verbose=0)

Generate a hddm-ready dataset from a single set of parameters

Arguments:
parameters: dict, list or numpy array

Model parameters with which to simulate. Dict is preferable for informative error messages. If you know the order of parameters for your model of choice, you can also directly supply a list or nump.array which needs to have the parameters in the correct order.

p_outlier: float between 0 and 1 <default=0>

Probability of generating outlier datapoints. An outlier is defined as a random choice from a uniform RT distribution

max_rt_outlier: float > 0 <default=10.0>

Using max_rt_outlier (which is commonly defined for hddm models) here as an imlicit maximum on the RT of outliers. Outlier RTs are sampled uniformly from [0, max_rt_outlier]

model: str <default=’angle’>

String that specifies the model to be simulated. Current options include, ‘angle’, ‘ornstein’, ‘levy’, ‘weibull’, ‘full_ddm’

n_samples: int <default=1000>

Number of samples to simulate.

delta_t: float <default=0.001>

Size fo timesteps in simulator (conceptually measured in seconds)

max_t: float <default=20>

Maximum reaction the simulator can reach

bin_dim: int <default=None>

If simulator output should be binned, this specifies the number of bins to use

bin_pointwise: bool <default=False>

Determines whether to bin simulator output pointwise. Pointwise here is in contrast to producing binned output in the form of a histogram. Binning pointwise gives each trial’s RT and index which is the respective bin-number. This is expected when you are using the ‘cnn’ network to fit the dataset later. If pointwise is not chosen, then the takes the form of a histogram, with bin-wise frequencies.

Return: tuple of (pandas.DataFrame, dict, list)

The first part of the tuple holds a DataFrame with a ‘reaction time’ column and a ‘response’ column. Ready to be fit with hddm. The second part of the tuple hold a dict with parameter names as keys and parameter values as values. The third part gives back the parameters supplied in array form. This return is consistent with the returned objects in other data generators under hddm.simulators

hddm.simulators.hddm_dataset_generators.simulator_stimcoding(model='angle', split_by='v', p_outlier=0.0, max_rt_outlier=10.0, drift_criterion=0.0, n_trials_per_condition=1000, delta_t=0.001, prespecified_params={}, bin_pointwise=False, bin_dim=None, max_t=20.0)

Generate a dataset as expected by Hddmstimcoding. Essentially it is a specific way to parameterize two condition data.

Arguments:
parameters: list or numpy array

Model parameters with which to simulate.

model: str <default=’angle’>

String that specifies the model to be simulated. Current options include, ‘angle’, ‘ornstein’, ‘levy’, ‘weibull’, ‘full_ddm’

split_by: str <default=’v’>

You can split by ‘v’ or ‘z’. If splitting by ‘v’ one condition’s v_0 = drift_criterion + ‘v’, the other condition’s v_1 = drift_criterion - ‘v’. Respectively for ‘z’, ‘z_0’ = ‘z’ and ‘z_1’ = 1 - ‘z’.

p_outlier: float between 0 and 1 <default=0>

Probability of generating outlier datapoints. An outlier is defined as a random choice from a uniform RT distribution

max_rt_outlier: float > 0 <default=10.0>

Using max_rt_outlier (which is commonly defined for hddm models) here as an imlicit maximum on the RT of outliers. Outlier RTs are sampled uniformly from [0, max_rt_outlier]

drift_criterion: float <default=0.0>

Parameter that can be treated as the ‘bias part’ of the slope, in case we split_by ‘v’.

n_trials_per_condition: int <default=1000>

Number of samples to simulate per condition (here 2 condition by design).

delta_t: float <default=0.001>

Size fo timesteps in simulator (conceptually measured in seconds)

prespecified_params: dict <default = {}>

A dictionary with parameter names keys. Values are list of either length 1, or length equal to the number of conditions (here 2).

max_t: float <default=20>

Maximum reaction the simulator can reach

bin_dim: int <default=None>

If simulator output should be binned, this specifies the number of bins to use

bin_pointwise: bool <default=False>

Determines whether to bin simulator output pointwise. Pointwise here is in contrast to producing binned output in the form of a histogram. Binning pointwise gives each trial’s RT and index which is the respective bin-number. This is expected when you are using the ‘cnn’ network to fit the dataset later. If pointwise is not chosen, then the takes the form of a histogram, with bin-wise frequencies.

Return: pandas.DataFrame holding a ‘reaction time’ column and a ‘response’ column. Ready to be fit with hddm.

Module contents