Programmatic Access#
Important
Before using any programmatic access to the data, you first need to set up your CAVEclient token.
CAVEclient#
Most programmatic access to the CAVE services occurs through CAVEclient, a Python client to access various types of data from the online services.
Full documentation for CAVEclient is available here.
To initialize a caveclient, we give it a datastack, which is a name that defines a particular combination of imagery, segmentation, and annotation database.
For the MICrONs public data, we use the datastack name minnie65_public
.
import os
from caveclient import CAVEclient
datastack_name = 'minnie65_public'
client = CAVEclient(datastack_name)
# set version, for consistency across time
client.materialize.version = 1078 # Current as of Summer 2024
# Show the description of the datastack
client.info.get_datastack_info()['description']
'This is the publicly released version of the minnie65 volume and segmentation. '
CAVEclient Basics#
The most frequent use of the CAVEclient is to query the database for annotations like synapses.
All database functions are under the client.materialize
property.
To see what tables are available, use the get_tables
function:
client.materialize.get_tables()
['proofreading_status_and_strategy',
'synapse_target_structure',
'aibs_metamodel_celltypes_v661',
'nucleus_alternative_points',
'allen_column_mtypes_v2',
'bodor_pt_cells',
'aibs_metamodel_mtypes_v661_v2',
'allen_v1_column_types_slanted_ref',
'aibs_column_nonneuronal_ref',
'nucleus_ref_neuron_svm',
'apl_functional_coreg_vess_fwd',
'vortex_compartment_targets',
'baylor_log_reg_cell_type_coarse_v1',
'functional_properties_v3_bcm',
'l5et_column',
'pt_synapse_targets',
'proofreading_status_public_release',
'coregistration_auto_phase3_fwd_apl_vess_combined',
'coregistration_manual_v4',
'nucleus_neuron_svm',
'coregistration_manual_v3',
'vortex_manual_myelination_v0',
'synapses_pni_2',
'nucleus_detection_v0',
'vortex_manual_nodes_of_ranvier',
'bodor_pt_target_proofread',
'vortex_astrocyte_proofreading_status',
'nucleus_functional_area_assignment',
'coregistration_auto_phase3_fwd']
For each table, you can see the metadata describing that table.
For example, let’s look at the nucleus_detection_v0
table:
client.materialize.get_table_metadata('nucleus_detection_v0')
{'aligned_volume': 'minnie65_phase3',
'created': '2020-11-02T18:56:35.530100',
'table_name': 'nucleus_detection_v0',
'valid': True,
'id': 38256,
'schema': 'nucleus_detection',
'schema_type': 'nucleus_detection',
'user_id': '121',
'description': 'A table of nuclei detections from a nucleus detection model developed by Shang Mu, Leila Elabbady, Gayathri Mahalingam and Forrest Collman. Pt is the centroid of the nucleus detection. id corresponds to the flat_segmentation_source segmentID. Only included nucleus detections of volume>25 um^3, below which detections are false positives, though some false positives above that threshold remain. ',
'notice_text': None,
'reference_table': None,
'flat_segmentation_source': 'precomputed://https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie65/nuclei',
'write_permission': 'PRIVATE',
'read_permission': 'PUBLIC',
'last_modified': '2022-10-25T19:24:28.559914',
'segmentation_source': '',
'pcg_table_name': 'minnie3_v1',
'last_updated': '2024-08-19T10:10:01.191593',
'voxel_resolution': [4.0, 4.0, 40.0]}
You get a dictionary of values. Two fields are particularly important: the description
, which offers a text description of the contents of the table and voxel_resolution
which defines how the coordinates in the table are defined, in nm/voxel.
Querying Tables#
To get the contents of a table, use the query_table
function.
This will return the whole contents of a table without any filtering, up to for a maximum limit of 200,000 rows.
The table is returned as a Pandas DataFrame and you can immediately use standard Pandas function on it.
cell_type_df = client.materialize.query_table('nucleus_detection_v0')
cell_type_df.head()
id | created | superceded_id | valid | volume | pt_supervoxel_id | pt_root_id | pt_position | bb_start_position | bb_end_position | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 730537 | 2020-09-28 22:40:41.780734+00:00 | NaN | t | 32.307937 | 0 | 0 | [381312, 273984, 19993] | [nan, nan, nan] | [nan, nan, nan] |
1 | 373879 | 2020-09-28 22:40:41.781788+00:00 | NaN | t | 229.045043 | 96218056992431305 | 864691136090135607 | [228816, 239776, 19593] | [nan, nan, nan] | [nan, nan, nan] |
2 | 601340 | 2020-09-28 22:40:41.782714+00:00 | NaN | t | 426.138010 | 0 | 0 | [340000, 279152, 20946] | [nan, nan, nan] | [nan, nan, nan] |
3 | 201858 | 2020-09-28 22:40:41.783784+00:00 | NaN | t | 93.753836 | 84955554103121097 | 864691135373893678 | [146848, 213600, 26267] | [nan, nan, nan] | [nan, nan, nan] |
4 | 600774 | 2020-09-28 22:40:41.785273+00:00 | NaN | t | 135.189791 | 0 | 0 | [339120, 276112, 19442] | [nan, nan, nan] | [nan, nan, nan] |
Important
While most tables are small enough to be returned in full, the synapse table has hundreds of millions of rows and is too large to download this way
Tables have a collection of columns, some of which specify point in space (columns ending in _position
), some a root id (ending in _root_id
), and others that contain other information about the object at that point.
Before describing some of the most important tables in the database, it’s useful to know about a few advanced options that apply when querying any table.
desired_resolution
: This parameter allows you to convert the columns specifying spatial points to different resolutions. Many tables are stored at a resolution of 4x4x40 nm/voxel, for example, but you can convert to nanometers by settingdesired_resolution=[1,1,1]
.split_positions
: This parameter allows you to split the columns specifying spatial points into separate columns for each dimension. The new column names will be the original column name with_x
,_y
, and_z
appended.select_columns
: This parameter allows you to get only a subset of columns from the table. Once you know exactly what you want, this can save you some cleanup.limit
: This parameter allows you to limit the number of rows returned. If you are just testing out a query or trying to inspect the kind of data within a table, you can set this to a small number to make sure it works before downloading the whole table. Note that this will show a warning so that you don’t accidentally limit your query when you don’t mean to.
For example, using all of these together:
cell_type_df = client.materialize.query_table('nucleus_detection_v0', split_positions=True, desired_resolution=[1,1,1], select_columns=['pt_position', 'pt_root_id'], limit=10)
cell_type_df
201 - "Limited query to 10 rows
pt_position_x | pt_position_y | pt_position_z | pt_root_id | |
---|---|---|---|---|
0 | 241856.0 | 374464.0 | 838720.0 | 0 |
1 | 227200.0 | 389120.0 | 797160.0 | 0 |
2 | 230144.0 | 422336.0 | 795320.0 | 0 |
3 | 239488.0 | 386432.0 | 794120.0 | 0 |
4 | 239744.0 | 423488.0 | 803120.0 | 864691136050815731 |
5 | 245888.0 | 384512.0 | 800120.0 | 0 |
6 | 249792.0 | 391680.0 | 807080.0 | 0 |
7 | 243328.0 | 403008.0 | 794280.0 | 0 |
8 | 247872.0 | 386816.0 | 805320.0 | 0 |
9 | 260352.0 | 416640.0 | 802360.0 | 864691135013273238 |
Filtering Queries#
Filtering tables so that you only get data about certain rows back is a very common operation.
While there are filtering options in the query_table
function (see documentation for more details), a more
unified filter interface is available through a “table manager” interface.
Rather than passing a table name to the query_table
function, client.materialize.tables
has a subproperty for each table in the database that can be used to filter that table.
The general pattern for usage is
client.materialize.tables.{table_name}({filter options}).query({format and timestamp options})
where {table_name}
is the name of the table you want to filter, {filter options}
is a collection of arguments for filtering the query, and {format and timestamp options}
are those parameters controlling the format and timestamp of the query.
For example, let’s look at the table aibs_metamodel_celltypes_v661
, which has cell type predictions across the dataset.
We can get the whole table as a DataFrame:
cell_type_df = client.materialize.tables.aibs_metamodel_celltypes_v661().query()
cell_type_df.head()
id | created | valid | volume | pt_supervoxel_id | pt_root_id | id_ref | created_ref | valid_ref | target_id | classification_system | cell_type | pt_position | bb_start_position | bb_end_position | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 336365 | 2020-09-28 22:42:48.966292+00:00 | t | 272.488202 | 93606511657924288 | 864691136274724621 | 36916 | 2023-12-19 22:47:18.659864+00:00 | t | 336365 | excitatory_neuron | 5P-IT | [209760, 180832, 27076] | [nan, nan, nan] | [nan, nan, nan] |
1 | 110648 | 2020-09-28 22:45:09.650639+00:00 | t | 328.533443 | 79385153184885329 | 864691135489403194 | 1070 | 2023-12-19 22:38:00.472115+00:00 | t | 110648 | excitatory_neuron | 23P | [106448, 129632, 25410] | [nan, nan, nan] | [nan, nan, nan] |
2 | 112071 | 2020-09-28 22:43:34.088785+00:00 | t | 272.929423 | 79035988248401958 | 864691136147292311 | 1099 | 2023-12-19 22:38:00.898837+00:00 | t | 112071 | excitatory_neuron | 23P | [103696, 149472, 15583] | [nan, nan, nan] | [nan, nan, nan] |
3 | 197927 | 2020-09-28 22:43:10.652649+00:00 | t | 91.308851 | 84529699506051734 | 864691136050858227 | 13259 | 2023-12-19 22:41:14.417986+00:00 | t | 197927 | nonneuron | oligo | [143600, 186192, 26471] | [nan, nan, nan] | [nan, nan, nan] |
4 | 198087 | 2020-09-28 22:41:36.677186+00:00 | t | 161.744978 | 83756261929388963 | 864691135809440972 | 13271 | 2023-12-19 22:41:14.685474+00:00 | t | 198087 | nonneuron | astrocyte | [137952, 190944, 27361] | [nan, nan, nan] | [nan, nan, nan] |
and we can add similar formatting options as in the last section to the query function:
cell_type_df = client.materialize.tables.aibs_metamodel_celltypes_v661().query(split_positions=True, desired_resolution=[1,1,1], select_columns=['pt_position', 'pt_root_id', 'cell_type'], limit=10)
cell_type_df
pt_position_x | pt_position_y | pt_position_z | pt_root_id | cell_type | |
---|---|---|---|---|---|
0 | 257600.0 | 487936.0 | 802760.0 | 864691135724233643 | 23P |
1 | 260992.0 | 493568.0 | 801560.0 | 864691136436395166 | 23P |
2 | 256256.0 | 466432.0 | 831040.0 | 864691135462260637 | NGC |
3 | 255744.0 | 480640.0 | 833200.0 | 864691136723556861 | 23P |
4 | 262144.0 | 505856.0 | 824880.0 | 864691135776658528 | 23P |
5 | 257536.0 | 521728.0 | 804440.0 | 864691135941166708 | 23P |
6 | 251840.0 | 552896.0 | 832320.0 | 864691135545065768 | 23P |
7 | 251136.0 | 546048.0 | 821320.0 | 864691135479369926 | 23P |
8 | 256000.0 | 626368.0 | 814000.0 | 864691135697633557 | 23P |
9 | 324096.0 | 417920.0 | 658880.0 | 864691135937358133 | astrocyte |
However, now we can also filter the table to get only cells that are predicted to have cell type "BC"
(for “basket cell”).
my_cell_type = "BC"
client.materialize.tables.aibs_metamodel_celltypes_v661(cell_type=my_cell_type).query()
id | created | valid | volume | pt_supervoxel_id | pt_root_id | id_ref | created_ref | valid_ref | target_id | classification_system | cell_type | pt_position | bb_start_position | bb_end_position | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 369908 | 2020-09-28 22:40:41.814964+00:00 | t | 332.862751 | 96002690286851358 | 864691136522768017 | 43009 | 2023-12-19 22:48:53.577191+00:00 | t | 369908 | inhibitory_neuron | BC | [227104, 207840, 20841] | [nan, nan, nan] | [nan, nan, nan] |
1 | 613047 | 2020-09-28 22:40:41.982376+00:00 | t | 242.159780 | 113234168401651200 | 864691136065413528 | 82324 | 2023-12-19 22:58:39.896999+00:00 | t | 613047 | inhibitory_neuron | BC | [352688, 141616, 25312] | [nan, nan, nan] | [nan, nan, nan] |
2 | 193846 | 2020-09-28 22:40:41.897904+00:00 | t | 306.148966 | 82838443188669165 | 864691135684976823 | 12051 | 2023-12-19 22:40:57.133228+00:00 | t | 193846 | inhibitory_neuron | BC | [131568, 168496, 16452] | [nan, nan, nan] | [nan, nan, nan] |
3 | 402885 | 2020-09-28 22:40:41.994716+00:00 | t | 279.232348 | 97621720621533350 | 864691135645529583 | 48951 | 2023-12-19 22:50:24.710643+00:00 | t | 402885 | inhibitory_neuron | BC | [238848, 211712, 16471] | [nan, nan, nan] | [nan, nan, nan] |
4 | 615735 | 2020-09-28 22:40:41.957345+00:00 | t | 314.539540 | 112181247505371364 | 864691136311774525 | 83044 | 2023-12-19 22:58:50.269173+00:00 | t | 615735 | inhibitory_neuron | BC | [344880, 161104, 17084] | [nan, nan, nan] | [nan, nan, nan] |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3360 | 170777 | 2020-09-28 22:45:25.310708+00:00 | t | 499.103662 | 81230957054577082 | 864691135065994564 | 8968 | 2023-12-19 22:40:09.246333+00:00 | t | 170777 | inhibitory_neuron | BC | [119600, 250560, 15373] | [nan, nan, nan] | [nan, nan, nan] |
3361 | 208056 | 2020-09-28 22:45:25.401800+00:00 | t | 521.621668 | 84540007091735344 | 864691135270170533 | 15548 | 2023-12-19 22:41:48.382554+00:00 | t | 208056 | inhibitory_neuron | BC | [143472, 262944, 23693] | [nan, nan, nan] | [nan, nan, nan] |
3362 | 438586 | 2020-09-28 22:45:25.430745+00:00 | t | 529.501389 | 99807894274485381 | 864691136897160046 | 55791 | 2023-12-19 22:52:02.582669+00:00 | t | 438586 | inhibitory_neuron | BC | [254912, 247440, 23680] | [nan, nan, nan] | [nan, nan, nan] |
3363 | 591219 | 2020-09-28 22:45:25.526753+00:00 | t | 567.517839 | 110216764830845707 | 864691135279126177 | 79472 | 2023-12-19 22:57:53.993099+00:00 | t | 591219 | inhibitory_neuron | BC | [330320, 204752, 25060] | [nan, nan, nan] | [nan, nan, nan] |
3364 | 419363 | 2020-09-28 22:45:25.436862+00:00 | t | 530.642698 | 99716496901116512 | 864691136691390838 | 50504 | 2023-12-19 22:50:48.576826+00:00 | t | 419363 | inhibitory_neuron | BC | [254416, 90336, 20469] | [nan, nan, nan] | [nan, nan, nan] |
3365 rows × 15 columns
or maybe we just want the cell types for a particular collection of root ids:
my_root_ids = [864691135771677771, 864691135560505569, 864691136723556861]
client.materialize.tables.aibs_metamodel_celltypes_v661(pt_root_id=my_root_ids).query()
id | created | valid | volume | pt_supervoxel_id | pt_root_id | id_ref | created_ref | valid_ref | target_id | classification_system | cell_type | pt_position | bb_start_position | bb_end_position | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 19116 | 2020-09-28 22:41:51.767906+00:00 | t | 301.426115 | 74737997899501359 | 864691135771677771 | 11282 | 2023-12-19 22:40:43.249642+00:00 | t | 19116 | excitatory_neuron | 23P | [72576, 108656, 20291] | [nan, nan, nan] | [nan, nan, nan] |
1 | 21783 | 2020-09-28 22:41:59.966574+00:00 | t | 263.637074 | 75795590176519004 | 864691135560505569 | 15681 | 2023-12-19 22:41:50.365399+00:00 | t | 21783 | excitatory_neuron | 23P | [80128, 124000, 16563] | [nan, nan, nan] | [nan, nan, nan] |
2 | 4074 | 2020-09-28 22:42:41.341179+00:00 | t | 313.678234 | 73543309863605007 | 864691136723556861 | 50080 | 2023-12-19 22:50:42.474168+00:00 | t | 4074 | excitatory_neuron | 23P | [63936, 120160, 20830] | [nan, nan, nan] | [nan, nan, nan] |
You can get a list of all parameters than be used for querying with the standard IPython/Jupyter docstring functionality, e.g. client.materialize.tables.aibs_metamodel_celltypes_v661
.
Note
Use of this functionality will show a brief warning that the interface is experimental. This is because the interface is still being developed and may change in the near future in response to user feedback.
Querying Synapses#
While synapses are stored as any other table in the database, in this case synapses_pni_2
, this table is much larger than any other table at more than 337 million rows, and it works best when queried in a different way.
The synapse_query
function allows you to query the synapse table in a more convenient way than most other tables.
In particular, the pre_ids
and post_ids
let you specify which root id (or collection of root ids) you want to query, with pre_ids indicating the collection of presynaptic neurons and post_ids the collection of postsynaptic neurons.
Using both pre_ids
and post_ids
in one call is effectively a logical AND, returning only those synapses from neurons in the list of pre_ids
that target neurons in the list of post_ids
.
Let’s look at one particular example.
my_root_id = 864691135808473885
syn_df = client.materialize.synapse_query(pre_ids=my_root_id)
print(f"Total number of output synapses for {my_root_id}: {len(syn_df)}")
syn_df.head()
Total number of output synapses for 864691135808473885: 1498
id | created | superceded_id | valid | size | pre_pt_supervoxel_id | pre_pt_root_id | post_pt_supervoxel_id | post_pt_root_id | pre_pt_position | post_pt_position | ctr_pt_position | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 158405512 | 2020-11-04 06:48:59.403833+00:00 | NaN | t | 420 | 89385416926790697 | 864691135808473885 | 89385416926797494 | 864691135546540484 | [179076, 188248, 20233] | [179156, 188220, 20239] | [179140, 188230, 20239] |
1 | 185549462 | 2020-11-04 06:49:10.903020+00:00 | NaN | t | 4832 | 91356016507479890 | 864691135808473885 | 91356016507470163 | 864691135884799088 | [193168, 190452, 19262] | [193142, 190404, 19257] | [193180, 190432, 19254] |
2 | 138110803 | 2020-11-04 06:49:46.758528+00:00 | NaN | t | 3176 | 87263084540201919 | 864691135808473885 | 87263084540199587 | 864691135195078186 | [163440, 104292, 19808] | [163498, 104348, 19806] | [163460, 104356, 19804] |
3 | 157378264 | 2020-11-04 07:38:27.332669+00:00 | NaN | t | 412 | 89374490395905686 | 864691135808473885 | 89374490395921430 | 864691135446953106 | [179218, 107132, 19372] | [179204, 107010, 19383] | [179196, 107072, 19380] |
4 | 174798776 | 2020-11-04 10:10:59.416878+00:00 | NaN | t | 1796 | 90089104301487245 | 864691135808473885 | 90089104301487089 | 864691135489632314 | [184038, 188292, 19753] | [183920, 188202, 19754] | [183998, 188216, 19755] |
Note that synapse queries always return the list of every synapse between the neurons in the query, even if there are multiple synapses between the same pair of neurons.
A common pattern to generate a list of connections between unique pairs of neurons is to group by the root ids of the presynaptic and postsynaptic neurons and then count the number of synapses between them. For example, to get the number of synapses from this neuron onto every other neuron, ordered
syn_df.groupby(
['pre_pt_root_id', 'post_pt_root_id']
).count()[['id']].rename(
columns={'id': 'syn_count'}
).sort_values(
by='syn_count',
ascending=False,
)
# Note that the 'id' part here is just a way to quickly extract one column.
# This could be any of the remaining column names, but `id` is often convenient because it is common to all tables.
syn_count | ||
---|---|---|
pre_pt_root_id | post_pt_root_id | |
864691135808473885 | 864691135339009510 | 20 |
864691135214122296 | 16 | |
864691136578647572 | 15 | |
864691136066504856 | 13 | |
864691135841325283 | 11 | |
... | ... | |
864691135518210698 | 1 | |
864691135518407306 | 1 | |
864691135518426506 | 1 | |
864691135526398299 | 1 | |
864691137198458945 | 1 |
1037 rows × 1 columns