scanometrics package

ScanOMetrics: python package for personalized brain image analysis

Subpackages

Submodules

scanometrics.core module

Core ScanOMetrics classes and methods

class scanometrics.core.ScanOMetrics_project(bids_database, proc_pipeline='dldirect', dataset_id=None, cov2float={'sex': {'F': 1, 'M': 0, 'f': 1, 'm': 0}}, acq_pattern='*T1w', ses_delimiter='_', acq_delimiter='_', n_threads=-1, atlas='DesikanKilliany')

Bases: object

Class defining a ScanOMetrics project. Subjects should be stored according to BIDS data structure for multiple sessions: https://bids.neuroimaging.io/ Participants IDs and fixed co-variates should be saved in <bids_database>/participants.tsv Session specific co-variates (e.g. age, sequence, scanner) should be saved in the respective session.tsv files

add_ses_row(ID, row, ses_id, ses_row, acqs=['T1w'])

Add a subject’s session to self.covariate_values array and self.subject dictionary (appends to existing sessions or creates new subject). Missing covariates are set to np.nan. Sessions are combined with acq labels to allows subjects to have multiple scan inputs.

Parameters

ID (string) – subject_id, usually taken from <bids_database>/participants.tsv table
row (dict) – dictionary with subject’s covariates taken from <bids_database>/participants.tsv table
ses_id (string) – current session_id as taken from <bids_database>/<subject_id>/sessions.tsv table
ses_row (dict) – dictionary with subject’s session specific covariate values, taken from sessions.tsv table
acqs (string) – acquisition labels to use as input. A session is created for each session/acq label combination.

evaluate_singleSubject_allSes(subject_id, matching_covariates, min_num_ctrl=5, alpha_uncorr=0.01, q_fdr=0.05)

Evaluation method to assess whether a new scan significantly differs from normative ranges. Loops through all available sessions and scans available in self.subject[subject_id].

Parameters

subject_id (string) – ID of the participant to evaluate.
matching_covariates (list) – array of covariate names to filter set of normative controls to evaluate against.
min_num_ctrl (int) – minimum number of normative subject matches for the evaluation to be compted. If there are less matches, the evaluation is ignored.
alpha_uncorr (float) – statistical significance level for uncorrected p-values.
q_fdr (float) – statistical significance level for fdr-corrected p-values.

get_subjSesAcq_T1s(): Get list of paths to all T1 files in the loaded dataset.

get_subjSesAcq_array(): Get the array concatenating all <subj_ID>_<ses_ID>_<acq_label> strings in the loaded dataset.

get_subjSesAcq_id(subject_id, session_id, acq_label): Get row index for particular subject, session and acq_label (returns a single value).

get_subjSesAcq_row(subject_id, session_id, acq_label): Get row index for particular subject, session and acq_label (returns a single value).

get_subj_rows(subject_id): Get list of row indexes for a particular subject (returns a list for all subject sessions and acq labels).

load_normative_model(model_filename)

Loads normative model from pkl file into SOM.normativeModel structure, and overwrites SOM processing pipeline and cov2float to match those in the normative model.

Parameters: model_filename (string) – filename of model to load. Can be one of the outputs of list_normative_models(). Can also be the path to a pkl file (should contain the .pkl extension).

load_proc_metrics(subjects=None, stats2table_folder=None, ref_rows=None, ref_metric_values=None, ref_metric_names=None, ref_covariate_values=None, ref_covariate_names=None, metric_include=None, metric_exclude=None)

Load metrics computed by processing pipeline. We let the processing class to implement the specific loading of metrics according to it. Fills with Nans the values that don’t exist for a given subject. Can be saved with save_proc_metrics() (eg a 75x1258 matrix requires 760 kB). Covariates is a list of variable names in participants.tsv to keep and save in a covariate_values numpy array.

Parameters

subjects (dict) – subject dictionary following the SOM.subject structure, which takes participants IDs from the participants.tsv table and session IDs from sessions.tsv tables, according to the following structure: {‘<participant_id>’: {‘<session_id>’: {covariate_name[0]: covariate_value[0], …}}}
stats2table_folder (string) – path to folder containing stats2table files, to be loaded instead of subject specific files.
ref_rows (numpy array) – numpy array of indexes of rows in covariate_values to be passed to metric_prac_pipeline.prac2metric(). Indexes are used to specify which subject and session combinations should be used for reference during the normalization step.
ref_metric_values (numpy array.) – numpy array with values to use as reference for normalization of metrics.
ref_metric_names (list) – list of metric names to use as reference.
ref_covariate_values (list) – list of covariate values to find matching scans to normalize with.
ref_covariate_names (list) – list of covariate names to base search for matching scans on
metric_include (list) – list of metrics to keep after loading
metric_exclude (list) – list of metrics to exclude after loading

load_subjects(subjects_include=None, subjects_exclude=None, sub_ses_include=None, sub_ses_exclude=None, sub_ses_acq_include=None, sub_ses_acq_exclude=None, covariates_include=None, covariates_exclude=None, subjects_table=None, subj_ses_acq_pattern='^(?P<subjID>sub-.+?)(?=(_ses-|$))_(?P<sesID>ses-.+?)(?:(?=(_acq-|$))_(?P<acqID>acq-.+))?')

Load subject data from <self.bids_database>/participants.tsv, with inclusion/exclusion lists based on subject, session and acq labels, and inclusion/exclusion of covariate_values based on covariate name. Also keeps track of session/acquisition labels, and adds relevant information to self.subject and self.covariate_values. Repeated measures can be ignored by including/excluding specific sessions at runtime. Assumes repeated measures are a separate scan in <self.bids_database>/sub-<ID>/ses-<label>/anat/sub-<ID>_ses-<label>_acq-<label>.nii.gz from which all metrics are extracted. Session info should be in <self.bids_database>/sub-<ID>/sub-<ID>_sessions.tsv (one row per session, with a mandatory column named ‘session_id’ and parameter(s) that change(s) between sessions). The function first loops through participants.tsv and session.tsv files to gather subject IDs and session IDs, then filters out elements based on exclusion/inclusion criteria. Current implementation assumes a single session if there are no session.tsv files, and a session_id is automatically added, following bids recommendation that only a single row per subject must appear in the participants.tsv file. NB: everything relies on participants.tsv and sessions.tsv files. participants.tsv MUST have only one line per subject, and repeated scans MUST be encoded through a <participant_id>_sessions.tsv file in its bids folder.

Parameters

subjects_include – list of subjects to include. Defaults to None to load all subjects in participants.tsv
subjects_exclude – list of subjects to exclude. Defaults to None to load all subjects in subjects_include
sub_ses_include – list of sub-<subject_id>_ses-<session_id> to include. Defaults to None to load all ses.
sub_ses_exclude – list of sub-<subject_id>_ses-<session_id> to exclude. Defaults to None to laod all ses.
sub_ses_acq_include – list of sub-<subject_id>_ses-<session_id>_<acq_label> to include. Defaults to None to load all acqs.
sub_ses_acq_exclude – list of sub-<subject_id>_ses-<session_id>_<acq_label> to exclude. Defaults to None to laod all acqs.
covariates_include – list of covariate_values to include. Defaults to None to load all covariate_values in participants.tsv
covariates_exclude – list of covariate_values to exclude. Defaults to None to load all covariate_values in covariates_include
subjects_table – path to single tsv table with all subject names and covariates, overwrites BIDS scrapping
subj_ses_pattern – binary string used as regex expression to split subject and session IDs from scan_id when scrapping participants from a single table instead of BIDS structure.

plot_single_metric(subject_id, selected_metric, selected_session, selected_acquisition, output, matching_covariates, pid, alpha_uncorr=0.01)

Plot the selected metric for the given subject, session, and acquisition, ensuring that both hemispheres are plotted together if available, and symmetry index in the middle.

Parameters: subject_id (str): The ID of the subject. selected_metric (str): The metric to be plotted (eg aparc_lh_entorhinal_thickness). selected_session (str): The session of the subject. selected_acquisition (str): The acquisition label. output (dict): The output data containing metric names and deviation statistics. subj_covariate_values (ndarray): Covariate values for the subject. normModel_matching_cols (list): Columns for matching the normative model. subj_measured_metrics (ndarray): Measured metrics for the subject. pid (int): FDR threshold. alpha_uncorr (float, optional): Uncorrected alpha level. Default is 0.01.

Returns: Figure: A matplotlib figure containing the plot.

proc2table(n_threads=-1)

Method to convert pipeline specific outputs to a common table format. Has to be added to be able to import data processed externally but following the folder naming and organization implemented in the method self.metric_proc_pipeline.run_pipeline(). Externally processed data with different folder organization should be gathered into tables saved in <bids_database>/derivatives/<proc_pipeline>/<subject_id>_<session_id> folders by the user. Tables should match naming expected by self.metric_proc_pipeline.proc2metric(). In the case of freesurfer, this assumes that data to import is in a set of tables for each subject, located in <bids_directory>/derivatives/freesurfer_vX-X-X/<subject_id>_<session_id> and that subject IDs in the generated tables will be <subject_id>_<session_id>. If user processed freesurfer externally without using this format, they should change participants.tsv and folder structure to match it. Freesurfer’s version should be specified by replacing vX-X-X with the appropriate value (done automatically when running processing from ScanOMetrics). TODO: implement a warning instead of error when stumbling uppon an ID mismatch when loading tables.

Parameters: n_threads (int) – number of threads to use

run_proc_pipeline(subject_id=None, n_threads=None): Runs metric_proc_pipeline(), which generates measured metric values based on n_threads. Can be used to process normative data, or a set of subjects to evaluate against a trained dataset. Proc pipelines work in a dedicated ‘derivatives’ folder. In the case of Freesurfer, it expects a single directory with a single folder for each combination of [subject_id, session_id, acq_label]. Each scan in copied into a ‘<subject_id>_<session_id>_<acq_label>’ folder. This implies that subj_id in FS has to be compared to <subject_id>_<session_id>_<acq_label> when checking if the correct subject is being loaded. Also means that BIDS naming should respect <bids_directory>/<subject_id>/<session_id>/anat/<subject_id>_<session_id>_<acq_label>T1w.nii.gz. acq_label can be used to give a subject specific T1 acq_label array. Regarding BIDS guidelines, the file participants.tsv MUST have a single line per subject (each subject has to appear only once). This implies that repeated scans must be tracked with a sessions.tsv file encoding the variables changing across sessions (eg age, scanner or sequence), overwritting values from participatns.tsv if needed. load_subjects() takes care of loading covariates from the participants.tsv, and loop through the sessions.tsv to add session specific covariates. This is currently achieved by having self.subject as a dictionary with subject_id as keys, and each value is another dict with session_id as keys, with a last dictionary with covariate names and values as key/value pairs.

save_normative_model(output_filename=None)

Saves normative model to pkl file. By default, will save a <model_name>_<proc_pipeline>_<bids_directory>.pkl file in the scanometrics/resources/normative_models folder (e.g. Polynomial_dldirect_OASIS3). Provide output_filename to save to another file.

Parameters: output_filename (string) – path to pkl file (including filename and pkl extension)

set_normative_model(model_name='Polynomial')

Sets normative model based on model name (defaults to ‘Polynomial’) and a training set ID (defaults to dldirect_OASIS3). Intended to initialize normModel dictionary for further fitting or loading already trained model.

Parameters: model_name (string) – string identifying the model name. Should correspond to a <model_name>.py file in the folder scanometrics/processing

set_proc_pipeline(metric_proc_pipeline, atlas): Sets the pipeline used to process MRI scans and compute morphometric values for each participant. Should match scanometrics/processing/<metric_proc_pipeline>.py.

test_group_differences(matching_covariates, metric_names=None, normalizations=None, group_label=None, group_covariate=None)

Perform statistical test (Student t-test) between group residuals and residuals of matching normative scans. Consider different group options:

Loaded subjects against normative dataset

Groups inside loaded subjects, labeled by a variable, intended to be a single label to mask out subjects before testing. This function allows loading a complete dataset, and test a certain group against a normative model, one group at a time (for several groups, test_group_differences() should be called once per group, changing the group_label value).

Parameters

matching_covariates (list) – list of covariate names to use to filter matching scans in the normative dataset
metric_names (list of strings) – list of metric names to test for differences between groups. Defaults to None, which results in testing all metrics intersecting self.metric_names and self.normativeModel.metric_names.
normalizations (list of strings) – list of normalization types to analyse. Defaults to None, which results into analysing both ‘orig’ and ‘norm’ datasets in self.measured_metrics.
group_label (int) – label of the group to test agains the normative dataset. Should correspond to a value in group_covariate.
group_covariate (array of ints) – individual scan group label. Used with the group_label parameter to filter out subjects to be tested against the normative dataset.

scanometrics.normative_models module

Module for implementation of developmental models

class scanometrics.normative_models.Polynomial(measured_metrics, metric_names, covariate_values, covariate_names, dataset_id, proc_pipeline_id, proc_pipeline_version, cov2float, subject)

Bases: normative_model_template

Normative model based on polynomial fit. ‘model_dataset_id’ mixes normative model name with name of training dataset to keep track of combination used for training.

fit(flag_opt, global_deg_max=None, frac=20, alpha=0.01, N_cycl=10, n_uniform_bins=10)

Computes ‘estimated_parameters’ and ‘residual’ matrices for the model, according to ‘measured_metrics’ array. measured_metrics is the training dataset, given as a numpy array (a copy is created in self.measured_metrics to be saved/loaded). Outliers in the measured_metrics matrix still have values for computation of residuals.

Parameters: flag_opt (Bool) – flag for optimizing maximum degree of polynomial. If set to False, the maximum degree is used. If set to True, the SSE is computed for increasingly higher degrees, and stopped when there’s no significant improvement of the model.

load_X()

load_model_parameters()

predict_values(age)

save_X()

save_model_parameters()

class scanometrics.normative_models.normative_model_template(model_id, measured_metrics, metric_names, covariate_values, covariate_names, dataset_id, proc_pipeline_id, proc_pipeline_version, cov2float, subject)

Bases: object

compute_uncertainty(min_repeat_subjs=2, frac=0.1): Compute uncertainty over metrics in self.measured_metrics. Uncertainty is computed using repeated measures when available, otherwise approximates it with a given fraction of the mean across subjects (default frac=0.1). The uncertainty represents how much measured metrics are expected to deviate from their mean value, in the same units (eg mm for cortical thickness, mm^2 for surface area, etc…). Sets self.uncertainty to the computed value (vector with same number of values as columns in self.measured_metrics) Should be moved to normativeModel as it is currently copied from self. and not used outside context of normativeModel :param self: ScanOMmetrics project

flag_outliers(k_IQR, variable='measured_metrics')

Label subjects as outlier if morphological value $x\not\in[q_{25}-k*IQR;q_{75}+k*IQR]$, where $q_{25}$ and $q_{75}$ are the 25th and 75th percentiles, IQR is the interquartile range, and k_IQR sets the threshold of how many IQRs are considered for labeling outliers. Subjects are compared to their age matching group [0.9*age,1.1*age]. The ‘metrics’ matrix is a NxM matrix with N subjects and M metrics (eg self.measured_metrics or self.normativeModel.residuals). The quantiles are computed using numpy.quantile and the ‘hazen’ method, to obtain the same results as the Matlab ‘quantile’ function.

Parameters: k_IQR (float > 0.) – factor of IQRs to be used as threshold for outlier detection.

scanometrics.normative_models.uniform_subsample_scheme(samples, n_cyl, n_bins=10)

Resample dataset to get an approximate uniform distribution (inspired from the following post on stackoverflow https://stackoverflow.com/questions/66476638/downsampling-continuous-variable-to-uniform-distribution). Taking the lowest bin count as sampling number can be too strict (i.e. it is likely that a bin has 1 or 2 subjects in it, which would lead to ~10 subjects selected for the analysis), but the method is quick and efficient.

Parameters

samples –
n_cyl –

Returns

scanometrics.normative_models.uniform_subsample_scheme_old(samples, n_cyl, width, seed=None)

Generates n_cyl subsampling schemes with approximate uniform distribution, by selecting subsamples with probability inversely proportional to the density. Original implementation from Octave’s code. Possible shorter and pythonic implementation to be tested from here: https://stackoverflow.com/questions/66476638/downsampling-continuous-variable-to-uniform-distribution

Parameters

samples – sample from which subsamples should be taken.
n_cyl – number of subsamples sets to generate
width – width of broadening gaussian
seed – seed for np.random.seed() for testing

Return idx

np.array of size (len(sample),n_cyl), with 0 for excluded samples and 1 for included samples