Annotation Tables#
The minnie65_public
data release includes a number of annotation tables that help label the dataset.
This section describes the content of each of these tables — see here for instructions for how to query and filter tables.
Unless otherwise specificied (i.e. via desired_resolution
), all positions are in units of 4,4,40 nm/voxel resolution.
Common Fields#
Several fields (or column names) are common to many tables. These fall into two main classes: the spatial point columns that are how we assign annotations to cells via points in the 3d space and book-keeping columns, that are used internally to track the state of the data.
Spatial Point Columns#
Most tables have one or more Bound Spatial Points, which is a location in the 3d space that tells the annotation to remain associated with the root id at that location.
Bound spatial points have will have one prefix, usually pt
(i.e. “point”) and three associated columns with different suffixes: _position
, _supervoxel_id
, and _root_id
.
For a given prefix {pt}
, the three columns are as follows:
The
{pt}_position
indicates the location of the point in 3d space.The
{pt}_supervoxel_id
indicates a unique identifier in the segmentation, and is mostly internal bookkeeping.The
{pt}_root_id
indicates the root id of the annotation at that location.
Book-keeping Columns#
Several columns are common to many or all tables, and mostly used as internal book-keeping. Rather than describe these for every table, they will just be mentioned briefly here:
Column |
Description |
---|---|
|
A unique ID specific to the annotation within that table. |
|
The date that the annotation was created. |
|
Internal bookkeeping column, should always be |
|
Some tables reference other tables, particularly the nucleus table. If present, this column will be the same as |
|
For reference tables, the data shows both the created/valid/id of the reference annotation and the target annotation. The values with the |
Synapse Table#
Table name: synapses_pni_v2
The only synapse table is synapses_pni_v2
. This is by far the largest table in the dataset with 337 million entries, one for each synapse.
It contains the following columns (in addition to the bookkeeping columns):
Column Definitions
Column |
Description |
---|---|
|
The bound spatial point data for the presynaptic side of the synapse. |
|
The bound spatial point data for the postsynaptic side of the synapse. |
|
The size of the synapse in voxels. This correlates well, but not perfectly, with the surface area of synapse. |
|
A position in the center of the detected synaptic junction. Of all points in the synapse table, this is usually the closest point to the surface (and thus mesh) of both neurons. Because it is at the edge of cells, it is not associated with a root id. |
Nucleus Table#
Table name: nucleus_ref_neuron_svm
Nucleus detection has been used to define unique cells in the dataset.
Distinct from the neuronal segmentation, a convolutional neural network was trained to segment nuclei.
Each nucleus detection was given a unique ID, and the centroid of the nucleus was recorded as well as its volume.
While the table of centroids for all nuclei is nucleus_detection_v0
, this includes neuronal nuclei, non-neuronal nuclei, and some erroneous detections.
The table nucleus_ref_neuron_svm
shows the results of a classifier that was trained to distinguish neuronal nuclei from non-neuronal nuclei and errors.
For the purposes of analysis, we recommend using the nucleus_ref_neuron_svm
table to get the most broad collection of neurons in the dataset.
The key columns of nucleus_ref_neuron_svm
are:
Column Definitions
Column |
Description |
---|---|
|
Soma ID for the cell. |
|
Bound spatial point columns associated with the centroid of the nucleus. |
|
Describes how the classification was done. All values will be |
|
The output of the classifier. All values will be either |
Note that the id
column is the same as the nucleus id.
Cell Type Tables#
There are several tables that contain information about the cell type of neurons in the dataset, with each table representing a different method of doing the classificaiton.
Because each method requires a different kind of information, not all cells are present in all tables.
Each of the cell types tables has the same format and in all cases the id
column references the nucleus id of the cell in question.
Predictions from soma/nucleus features#
Table name: aibs_soma_nuc_metamodel_preds_v117
This table contains the results of a hierarchical classifier trained on features of the cell body and nucleus of cells. This was applied to most cells in the dataset that had complete cell bodies (e.g. not cut off by the edge of the data). For more details, see Elabbady et al. 2022. In general, this does a good job, but sometimes confuses layer 5 inhibitory neurons as being excitatory: The key columns are:
Column Definitions
Column |
Description |
|||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Soma ID for the cell. |
|||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bound spatial point columns associated with the centroid of the cell nucleus. |
|||||||||||||||||||||||||||||||||||||||||||||||||||
|
Either |
|||||||||||||||||||||||||||||||||||||||||||||||||||
|
One of several cell types:
|
Coarse prediction from spine detection#
Table name: baylor_log_reg_cell_type_coarse_v1
This table contains the results of a logistic regression classifier trained on properties of neuronal dendrites. This was applied to many cells in the dataset, but required more data than soma and nucleus features alone and thus more cells did not complete the pipeline. It has very good performance on excitatory vs inhibitory neurons because it focuses on dendritic spines, a characteristic property of excitatory neurons. It is a good table to double check E/I classifications if in doubt.
The key columns are:
Column Definitions
Column |
Description |
---|---|
|
Soma ID for the cell. |
|
Bound spatial point columns associated with the centroid of the cell nucleus. |
|
|
|
|
Fine prediction from dendritic features#
Table name: allen_column_mtypes_v1
This table contains all neurons within a well-proofread 100 micron square column
in VISp spanning all layers. Excitatory neurons and inhibitory neurons
were distinguished manually, and subclasses were assigned based on a data-driven
clustering of the neuronal features. Inhibitory neurons were classified based on
how they distributed they synaptic outputs onto target cells, while exictatory
neurons were classified based on a collection of dendritic features. For more
details, see the section on the minnie column or read the
preprint Schneider-Mizell et al. [2023]. Note that all cell type labels in this
column come from a clustering specific to this paper, and while they are
intended to align with the broader literature they are not a direct mapping or a
well-established convention. For a more conventional set of labels on the same
set of cells, look at the table allen_v1_column_types_slanted_ref
. Cell types
in that table align with those in aibs_soma_nuc_metamodel_preds_v117
above.
The key columns are:
Column Definitions
Column |
Description |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Soma ID for the cell. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bound spatial point columns associated with the centroid of the cell nucleus. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
One of several cell types.
|
Proofreading Tables#
Table name: proofreading_status_public_release
The table proofreading_status_public_release
describes the status of cells selected for manual proofreading.
Because of the inherent difference in the challenge and time required for different kinds of proofreading, we describe the status of axons and dendrites separately.
Further, we distinguish three different categories of proofreading:
non
: No proofreading has been comprehensively performed.clean
: Proofreading has comprehensively removed false merges, but not necessarily added missing parts.extended
: Proofreading has comprehensively removed false merges and attempted to add all or most missing parts.
Note that many cells not in this table have been edited in some places, but not comprehensively worked on. For more information, please see Proofreading and Data Quality.
The key columns are:
Column Definitions
Column |
Description |
---|---|
|
ID within the proofreading table (not cell id). |
|
Bound spatial point columns associated with the centroid of the cell nucleus being proofread. |
|
The root id of the neuron when it the proofreading assessment was made. |
|
The status of the dendrite proofreading. One of the three categories described above. |
|
The status of the axon proofreading. One of the three categories described above. |
Functional Coregistration Tables#
To relate the structural data to functional data, cell bodies must be coregistered between the functional imaging and EM volumes. The results of this coregistration are stored in two tables with the same columns:
coregistration_manual_v3
: The results of manually verified coregistration. This table is well-verified, but contains fewer ROIs (N=12,052 root ids, 13,925 ROIs).apl_functional_coreg_forward_v5
: The results of automated functional matching between the EM and 2-p functional data. This table is not manually verified, but contains more ROIs (N=36,078 root ids, 68,873 ROIs).
Please see the Functional Data section for more information about using this data.
The column descriptions are:
Column Definitions
Column |
Description |
---|---|
|
Soma ID for the cell. |
|
Bound spatial point columns associated with the centroid of the cell nucleus being proofread. |
|
The session index from functional imaging. |
|
The scan index from functional imaging. |
|
The ROI index from functional imaging. Only unique within scan and session. |
|
The field index from functional imaging. |
|
The residual distance between the functional and the assigned structural points after transformation, in microns. Smaller values indicate a closer match. |
|
A separation score, measuring the difference between the residual distance to the assigned neuron and the distance to the nearest non-assigned neuron, in microns. This can be negative if the non-assigned neuron is closer than the assigned neuron. Larger values indicate fewer nearby neurons that could be confused with the assigned neuron. |
All Tables#
Table Name |
Number of Annotations |
Description |
---|---|---|
|
337,312,429 |
The locations of synapses and the segment ids of the pre and post-synaptic automated synapse detection. |
|
144,120 |
The locations of nuclei detected via a fully automated method. |
|
8,388 |
A reference annotation table marking alternative segment_id lookup locations for a subset of nuclei in nucleus_detection_v0 that is more accurate than the centroid location listed there. |
|
144,120 |
A reference annotation indicating the output of a model detecting which nucleus detections are neurons versus which are not.1 |
|
13,658 |
A table indicating the association between individual units in the functional imaging data and nuclei in the structural data, derived from human powered matching. Includes residual and separation scores to help assess confidence. |
|
68,436 |
A table indicating the association between individual units in the functional imaging data and nuclei in the structural data, derived from the automated procedure. Includes residuals and separation scores to help assess confidence. |
|
1272 |
A table indicating which neurons have been proofread on their axons or dendrites. |
|
1039 |
A reference table on “proofreading_status_public_release” indicating what axon proofreading strategy was executed on each neuron. |
|
121,271 |
A table containing the number of edits on every segment_id associated with a nucleus in the volume. |
|
542 |
Cell type reference annotations from a human expert of non-neuronal cells located amongst the Minnie Column. |
|
1,357 |
Neuron cell type reference annotations from human experts of neuronal cells located amongst the Minnie Column. |
|
1,357 |
Neuron cell type reference annotations from data driven unsupervised clustering of neuronal cells |
|
58,624 |
Reference annotations indicating the output of a model predicting cell types across the dataset based on the labels from allen_column_mtypes_v1.1 |
|
86,916 |
Reference annotations indicating the output of a model predicting cell classes based on the labels from allen_v1_column_types_slanted_ref and aibs_column_nonneuronal_ref. |
|
55,063 |
Reference annotations indicated the output of a logistic regression model predicting whether the nucleus is part of an excitatory or inhibitory cell.50 |
|
49,051 |
Reference annotations indicated the output of a graph neural network model predicting the cell type based on the human labels in allen_v1_column_types_slanted_ref. |