Programmatic Access#
Important
Before using any programmatic access to the data, you first need to set up your CAVEclient token.
CAVEclient#
Most programmatic access to the CAVE services occurs through CAVEclient, a Python client to access various types of data from the online services.
Full documentation for CAVEclient is available here.
To initialize a caveclient, we give it a datastack, which is a name that defines a particular combination of imagery, segmentation, and annotation database.
For V1DD public data, use the datastack name v1dd_public
from caveclient import CAVEclient
datastack_name = 'v1dd_public'
client = CAVEclient(datastack_name)
# set version, for consistency across time
client.materialize.version = 1196 # Current as of Summer 2025
# Show the description of the datastack
client.info.get_datastack_info()['description']
For the MICrONs public data, we use the datastack name minnie65_public.
from caveclient import CAVEclient
datastack_name = 'minnie65_public'
client = CAVEclient(datastack_name)
# set version, for consistency across time
client.materialize.version = 1507 # Current as of Summer 2025
# Show the description of the datastack
client.info.get_datastack_info()['description']
'This is the publicly released version of the minnie65 volume and segmentation. '
The rest of this tutorial entry will use the MICrONS dataset. The same functions apply to both datasets, however the table names may differ. Refer to the Key Annotation Tables pages for the respective datasets
Materialization versions#
Data in CAVE is timestamped and periodically versioned - each (materialization) version corresponds to a specific timestamp. Individual versions are made publicly available. The Materialization client allows one to interact with the materialized annotation tables that were posted to the annotation service. These are called queries to the dataset, and available from client.materialize. For more, see the CAVEclient Documentation.
Periodic updates are made to the public datastack, which will include updates to the available tables. Some cells will have different pt_root_id because they have undergone proofreading.
Important
For analysis consistency, is worth checking the version of the data you are using, and consider specifying the version with client.version = your_version
Read more about setting the version of your analysis in the MICrONS tutorials.
# see the available materialization versions
client.materialize.get_versions()
[1300, 1078, 117, 661, 343, 1181, 795, 943, 1412, 1507]
And these are their associated timestamps (all timestamps are in UTC):
for version in client.materialize.get_versions():
print(f"Version {version}: {client.materialize.get_timestamp(version)}")
Version 1300: 2025-01-13 10:10:01.286229+00:00
Version 1078: 2024-06-05 10:10:01.203215+00:00
Version 117: 2021-06-11 08:10:00.215114+00:00
Version 661: 2023-04-06 20:17:09.199182+00:00
Version 343: 2022-02-24 08:10:00.184668+00:00
Version 1181: 2024-09-16 10:10:01.121167+00:00
Version 795: 2023-08-23 08:10:01.404268+00:00
Version 943: 2024-01-22 08:10:01.497934+00:00
Version 1412: 2025-04-29 10:10:01.200893+00:00
Version 1507: 2025-07-31 08:10:01.117494+00:00
# set materialization version, for consistency
client.version = 1507 # current public as of 7/31/2025
CAVEclient Basics#
The most frequent use of the CAVEclient is to query the database for annotations like synapses.
All database functions are under the client.materialize property.
To see what tables are available, use the get_tables function:
client.materialize.get_tables()
['baylor_gnn_cell_type_fine_model_v2',
'nucleus_alternative_points',
'allen_column_mtypes_v2',
'bodor_pt_cells',
'aibs_metamodel_mtypes_v661_v2',
'allen_v1_column_types_slanted_ref',
'aibs_column_nonneuronal_ref',
'nucleus_ref_neuron_svm',
'apl_functional_coreg_vess_fwd',
'vortex_compartment_targets',
'baylor_log_reg_cell_type_coarse_v1',
'gamlin_2023_mcs',
'l5et_column',
'pt_synapse_targets',
'coregistration_manual_v4',
'cg_cell_type_calls',
'synapses_pni_2',
'nucleus_detection_v0',
'vortex_manual_nodes_of_ranvier',
'bodor_pt_target_proofread',
'nucleus_functional_area_assignment',
'coregistration_auto_phase3_fwd_apl_vess_combined_v2',
'vortex_thalamic_proofreading_status',
'multi_input_spine_predictions_ssa',
'synapse_target_structure',
'proofreading_status_and_strategy',
'coregistration_auto_phase3_fwd_v2',
'vortex_peptidergic_proofreading_status',
'digital_twin_properties_bcm_coreg_v4',
'vortex_astrocyte_proofreading_status',
'digital_twin_properties_bcm_coreg_auto_phase3_fwd_v2',
'digital_twin_properties_bcm_coreg_apl_vess_fwd',
'gamlin_2023_mcs_met_types',
'vortex_manual_myelination_v0',
'synapse_target_predictions_ssa',
'aibs_metamodel_celltypes_v661']
For each table, you can see the metadata describing that table.
For example, let’s look at the nucleus_detection_v0 table:
client.materialize.get_table_metadata('nucleus_detection_v0')
{'schema': 'nucleus_detection',
'table_name': 'nucleus_detection_v0',
'created': '2020-11-02T18:56:35.530100',
'id': 71748,
'aligned_volume': 'minnie65_phase3',
'valid': True,
'schema_type': 'nucleus_detection',
'user_id': '121',
'description': 'A table of nuclei detections from a nucleus detection model developed by Shang Mu, Leila Elabbady, Gayathri Mahalingam and Forrest Collman. Pt is the centroid of the nucleus detection. id corresponds to the flat_segmentation_source segmentID. Only included nucleus detections of volume>25 um^3, below which detections are false positives, though some false positives above that threshold remain. ',
'notice_text': None,
'reference_table': None,
'flat_segmentation_source': 'precomputed://https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie65/nuclei',
'write_permission': 'PRIVATE',
'read_permission': 'PUBLIC',
'last_modified': '2022-10-25T19:24:28.559914',
'segmentation_source': '',
'pcg_table_name': 'minnie3_v1',
'last_updated': '2025-08-22T22:00:00.077663',
'voxel_resolution': [4.0, 4.0, 40.0]}
You get a dictionary of values. Two fields are particularly important: the description, which offers a text description of the contents of the table and voxel_resolution which defines how the coordinates in the table are defined, in nm/voxel.
Querying Tables#
To get the contents of a table, use the query_table function.
This will return the whole contents of a table without any filtering, up to for a maximum limit of 200,000 rows.
The table is returned as a Pandas DataFrame and you can immediately use standard Pandas function on it.
cell_type_df = client.materialize.query_table('nucleus_detection_v0')
cell_type_df.head()
| id | created | superceded_id | valid | volume | pt_supervoxel_id | pt_root_id | pt_position | bb_start_position | bb_end_position | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 730537 | 2020-09-28 22:40:41.780734+00:00 | NaN | t | 32.307937 | 0 | 0 | [381312, 273984, 19993] | [nan, nan, nan] | [nan, nan, nan] |
| 1 | 373879 | 2020-09-28 22:40:41.781788+00:00 | NaN | t | 229.045043 | 96218056992431305 | 864691136090135607 | [228816, 239776, 19593] | [nan, nan, nan] | [nan, nan, nan] |
| 2 | 601340 | 2020-09-28 22:40:41.782714+00:00 | NaN | t | 426.138010 | 0 | 0 | [340000, 279152, 20946] | [nan, nan, nan] | [nan, nan, nan] |
| 3 | 201858 | 2020-09-28 22:40:41.783784+00:00 | NaN | t | 93.753836 | 84955554103121097 | 864691135373893678 | [146848, 213600, 26267] | [nan, nan, nan] | [nan, nan, nan] |
| 4 | 600774 | 2020-09-28 22:40:41.785273+00:00 | NaN | t | 135.189791 | 0 | 0 | [339120, 276112, 19442] | [nan, nan, nan] | [nan, nan, nan] |
Important
While most tables are small enough to be returned in full, the synapse table has hundreds of millions of rows and is too large to download this way
Tables have a collection of columns, some of which specify point in space (columns ending in _position), some a root id (ending in _root_id), and others that contain other information about the object at that point.
Before describing some of the most important tables in the database, it’s useful to know about a few advanced options that apply when querying any table.
desired_resolution: This parameter allows you to convert the columns specifying spatial points to different resolutions. Many tables are stored at a resolution of 4x4x40 nm/voxel, for example, but you can convert to nanometers by settingdesired_resolution=[1,1,1].split_positions: This parameter allows you to split the columns specifying spatial points into separate columns for each dimension. The new column names will be the original column name with_x,_y, and_zappended.select_columns: This parameter allows you to get only a subset of columns from the table. Once you know exactly what you want, this can save you some cleanup.limit: This parameter allows you to limit the number of rows returned. If you are just testing out a query or trying to inspect the kind of data within a table, you can set this to a small number to make sure it works before downloading the whole table. Note that this will show a warning so that you don’t accidentally limit your query when you don’t mean to.
For example, using all of these together:
cell_type_df = client.materialize.query_table('nucleus_detection_v0',
split_positions=True,
desired_resolution=[1,1,1],
select_columns=['pt_position', 'pt_root_id'],
limit=10)
cell_type_df
201 - "Limited query to 10 rows
| pt_position_x | pt_position_y | pt_position_z | pt_root_id | |
|---|---|---|---|---|
| 0 | 241856.0 | 374464.0 | 838720.0 | 0 |
| 1 | 227200.0 | 389120.0 | 797160.0 | 0 |
| 2 | 230144.0 | 422336.0 | 795320.0 | 0 |
| 3 | 239488.0 | 386432.0 | 794120.0 | 0 |
| 4 | 239744.0 | 423488.0 | 803120.0 | 864691136050815731 |
| 5 | 245888.0 | 384512.0 | 800120.0 | 0 |
| 6 | 249792.0 | 391680.0 | 807080.0 | 0 |
| 7 | 243328.0 | 403008.0 | 794280.0 | 0 |
| 8 | 247872.0 | 386816.0 | 805320.0 | 0 |
| 9 | 260352.0 | 416640.0 | 802360.0 | 864691135013273238 |
Filtering Queries#
Filtering tables so that you only get data about certain rows back is a very common operation.
While there are filtering options in the query_table function (see documentation for more details), a more
unified filter interface is available through a “table manager” interface.
Rather than passing a table name to the query_table function, client.materialize.tables has a subproperty for each table in the database that can be used to filter that table.
The general pattern for usage is
client.materialize.tables.{table_name}({filter options}).query({format and timestamp options})
where {table_name} is the name of the table you want to filter, {filter options} is a collection of arguments for filtering the query, and {format and timestamp options} are those parameters controlling the format and timestamp of the query.
For example, let’s look at the table aibs_metamodel_celltypes_v661, which has cell type predictions across the dataset.
We can get the whole table as a DataFrame:
cell_type_df = client.materialize.tables.aibs_metamodel_celltypes_v661().query()
cell_type_df.head()
The `client.materialize.tables` interface is experimental and might experience breaking changes before the feature is stabilized.
| id_ref | created_ref | valid_ref | target_id | classification_system | cell_type | id | created | valid | volume | pt_supervoxel_id | pt_root_id | pt_position | bb_start_position | bb_end_position | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 36916 | 2023-12-19 22:47:18.659864+00:00 | t | 336365 | excitatory_neuron | 5P-IT | 336365 | 2020-09-28 22:42:48.966292+00:00 | t | 272.488202 | 93606511657924288 | 864691136274724621 | [209760, 180832, 27076] | [nan, nan, nan] | [nan, nan, nan] |
| 1 | 1070 | 2023-12-19 22:38:00.472115+00:00 | t | 110648 | excitatory_neuron | 23P | 110648 | 2020-09-28 22:45:09.650639+00:00 | t | 328.533443 | 79385153184885329 | 864691135489403194 | [106448, 129632, 25410] | [nan, nan, nan] | [nan, nan, nan] |
| 2 | 1099 | 2023-12-19 22:38:00.898837+00:00 | t | 112071 | excitatory_neuron | 23P | 112071 | 2020-09-28 22:43:34.088785+00:00 | t | 272.929423 | 79035988248401958 | 864691136147292311 | [103696, 149472, 15583] | [nan, nan, nan] | [nan, nan, nan] |
| 3 | 13259 | 2023-12-19 22:41:14.417986+00:00 | t | 197927 | nonneuron | oligo | 197927 | 2020-09-28 22:43:10.652649+00:00 | t | 91.308851 | 84529699506051734 | 864691135655940290 | [143600, 186192, 26471] | [nan, nan, nan] | [nan, nan, nan] |
| 4 | 13271 | 2023-12-19 22:41:14.685474+00:00 | t | 198087 | nonneuron | astrocyte | 198087 | 2020-09-28 22:41:36.677186+00:00 | t | 161.744978 | 83756261929388963 | 864691135809440972 | [137952, 190944, 27361] | [nan, nan, nan] | [nan, nan, nan] |
and we can add similar formatting options as in the last section to the query function:
cell_type_df = client.materialize.tables.aibs_metamodel_celltypes_v661().query(split_positions=True,
desired_resolution=[1,1,1],
select_columns={
'nucleus_detection_v0': ['pt_position', 'pt_root_id'],
'aibs_metamodel_celltypes_v661': ['cell_type'],
},
limit=10)
cell_type_df
The `client.materialize.tables` interface is experimental and might experience breaking changes before the feature is stabilized.
201 - "Limited query to 10 rows
| pt_position_x | pt_position_y | pt_position_z | pt_root_id | cell_type | |
|---|---|---|---|---|---|
| 0 | 257600.0 | 487936.0 | 802760.0 | 864691135724233643 | 23P |
| 1 | 260992.0 | 493568.0 | 801560.0 | 864691136436395166 | 23P |
| 2 | 256256.0 | 466432.0 | 831040.0 | 864691135462260637 | NGC |
| 3 | 255744.0 | 480640.0 | 833200.0 | 864691136723556861 | 23P |
| 4 | 262144.0 | 505856.0 | 824880.0 | 864691135776658528 | 23P |
| 5 | 257536.0 | 521728.0 | 804440.0 | 864691135941166708 | 23P |
| 6 | 251840.0 | 552896.0 | 832320.0 | 864691135545065768 | 23P |
| 7 | 251136.0 | 546048.0 | 821320.0 | 864691135479369926 | 23P |
| 8 | 256000.0 | 626368.0 | 814000.0 | 864691135697633557 | 23P |
| 9 | 324096.0 | 417920.0 | 658880.0 | 864691135937358133 | astrocyte |
However, now we can also filter the table to get only cells that are predicted to have cell type "BC" (for “basket cell”).
my_cell_type = "BC"
client.materialize.tables.aibs_metamodel_celltypes_v661(cell_type=my_cell_type).query()
The `client.materialize.tables` interface is experimental and might experience breaking changes before the feature is stabilized.
| id | created | valid | volume | pt_supervoxel_id | pt_root_id | id_ref | created_ref | valid_ref | target_id | classification_system | cell_type | pt_position | bb_start_position | bb_end_position | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 369908 | 2020-09-28 22:40:41.814964+00:00 | t | 332.862751 | 96002690286851358 | 864691136276011533 | 43009 | 2023-12-19 22:48:53.577191+00:00 | t | 369908 | inhibitory_neuron | BC | [227104, 207840, 20841] | [nan, nan, nan] | [nan, nan, nan] |
| 1 | 193846 | 2020-09-28 22:40:41.897904+00:00 | t | 306.148966 | 82838443188669165 | 864691135578780933 | 12051 | 2023-12-19 22:40:57.133228+00:00 | t | 193846 | inhibitory_neuron | BC | [131568, 168496, 16452] | [nan, nan, nan] | [nan, nan, nan] |
| 2 | 615735 | 2020-09-28 22:40:41.957345+00:00 | t | 314.539540 | 112181247505371364 | 864691135183493378 | 83044 | 2023-12-19 22:58:50.269173+00:00 | t | 615735 | inhibitory_neuron | BC | [344880, 161104, 17084] | [nan, nan, nan] | [nan, nan, nan] |
| 3 | 613047 | 2020-09-28 22:40:41.982376+00:00 | t | 242.159780 | 113234168401651200 | 864691136065413528 | 82324 | 2023-12-19 22:58:39.896999+00:00 | t | 613047 | inhibitory_neuron | BC | [352688, 141616, 25312] | [nan, nan, nan] | [nan, nan, nan] |
| 4 | 402885 | 2020-09-28 22:40:41.994716+00:00 | t | 279.232348 | 97621720621533350 | 864691135645529583 | 48951 | 2023-12-19 22:50:24.710643+00:00 | t | 402885 | inhibitory_neuron | BC | [238848, 211712, 16471] | [nan, nan, nan] | [nan, nan, nan] |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3360 | 170777 | 2020-09-28 22:45:25.310708+00:00 | t | 499.103662 | 81230957054577082 | 864691135065994564 | 8968 | 2023-12-19 22:40:09.246333+00:00 | t | 170777 | inhibitory_neuron | BC | [119600, 250560, 15373] | [nan, nan, nan] | [nan, nan, nan] |
| 3361 | 591219 | 2020-09-28 22:45:25.526753+00:00 | t | 567.517839 | 110216764830845707 | 864691135279126177 | 79472 | 2023-12-19 22:57:53.993099+00:00 | t | 591219 | inhibitory_neuron | BC | [330320, 204752, 25060] | [nan, nan, nan] | [nan, nan, nan] |
| 3362 | 208056 | 2020-09-28 22:45:25.401800+00:00 | t | 521.621668 | 84540007091735344 | 864691135801456226 | 15548 | 2023-12-19 22:41:48.382554+00:00 | t | 208056 | inhibitory_neuron | BC | [143472, 262944, 23693] | [nan, nan, nan] | [nan, nan, nan] |
| 3363 | 438586 | 2020-09-28 22:45:25.430745+00:00 | t | 529.501389 | 99807894274485381 | 864691135395662581 | 55791 | 2023-12-19 22:52:02.582669+00:00 | t | 438586 | inhibitory_neuron | BC | [254912, 247440, 23680] | [nan, nan, nan] | [nan, nan, nan] |
| 3364 | 419363 | 2020-09-28 22:45:25.436862+00:00 | t | 530.642698 | 99716496901116512 | 864691135954384419 | 50504 | 2023-12-19 22:50:48.576826+00:00 | t | 419363 | inhibitory_neuron | BC | [254416, 90336, 20469] | [nan, nan, nan] | [nan, nan, nan] |
3365 rows × 15 columns
or maybe we just want the cell types for a particular collection of root ids:
my_root_ids = [864691135771677771, 864691135560505569, 864691136723556861]
client.materialize.tables.aibs_metamodel_celltypes_v661(pt_root_id=my_root_ids).query()
The `client.materialize.tables` interface is experimental and might experience breaking changes before the feature is stabilized.
| id_ref | created_ref | valid_ref | target_id | classification_system | cell_type | id | created | valid | volume | pt_supervoxel_id | pt_root_id | pt_position | bb_start_position | bb_end_position | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 11282 | 2023-12-19 22:40:43.249642+00:00 | t | 19116 | excitatory_neuron | 23P | 19116 | 2020-09-28 22:41:51.767906+00:00 | t | 301.426115 | 74737997899501359 | 864691135771677771 | [72576, 108656, 20291] | [nan, nan, nan] | [nan, nan, nan] |
| 1 | 15681 | 2023-12-19 22:41:50.365399+00:00 | t | 21783 | excitatory_neuron | 23P | 21783 | 2020-09-28 22:41:59.966574+00:00 | t | 263.637074 | 75795590176519004 | 864691135560505569 | [80128, 124000, 16563] | [nan, nan, nan] | [nan, nan, nan] |
| 2 | 50080 | 2023-12-19 22:50:42.474168+00:00 | t | 4074 | excitatory_neuron | 23P | 4074 | 2020-09-28 22:42:41.341179+00:00 | t | 313.678234 | 73543309863605007 | 864691136723556861 | [63936, 120160, 20830] | [nan, nan, nan] | [nan, nan, nan] |
You can get a list of all parameters than be used for querying with the standard IPython/Jupyter docstring functionality, e.g. ?client.materialize.tables.aibs_metamodel_celltypes_v661.
?client.materialize.tables.aibs_metamodel_celltypes_v661
Querying Synapses#
While synapses are stored as any other table in the database, in this case synapses_pni_2, this table is much larger than any other table at more than 337 million rows, and it works best when queried in a different way.
The synapse_query function allows you to query the synapse table in a more convenient way than most other tables.
In particular, the pre_ids and post_ids let you specify which root id (or collection of root ids) you want to query, with pre_ids indicating the collection of presynaptic neurons and post_ids the collection of postsynaptic neurons.
Using both pre_ids and post_ids in one call is effectively a logical AND, returning only those synapses from neurons in the list of pre_ids that target neurons in the list of post_ids. This can be especially useful if you want to find the connectivity between only the proofread cells, for example
Let’s look at one particular example.
my_root_id = 864691136968109774
syn_df = client.materialize.synapse_query(pre_ids=my_root_id)
print(f"Total number of output synapses for {my_root_id}: {len(syn_df)}")
syn_df.head()
Total number of output synapses for 864691136968109774: 1499
| id | created | superceded_id | valid | size | pre_pt_supervoxel_id | pre_pt_root_id | post_pt_supervoxel_id | post_pt_root_id | pre_pt_position | post_pt_position | ctr_pt_position | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 158405512 | 2020-11-04 06:48:59.403833+00:00 | NaN | t | 420 | 89385416926790697 | 864691136968109774 | 89385416926797494 | 864691135546540484 | [179076, 188248, 20233] | [179156, 188220, 20239] | [179140, 188230, 20239] |
| 1 | 185549462 | 2020-11-04 06:49:10.903020+00:00 | NaN | t | 4832 | 91356016507479890 | 864691136968109774 | 91356016507470163 | 864691135884799088 | [193168, 190452, 19262] | [193142, 190404, 19257] | [193180, 190432, 19254] |
| 2 | 138110803 | 2020-11-04 06:49:46.758528+00:00 | NaN | t | 3176 | 87263084540201919 | 864691136968109774 | 87263084540199587 | 864691135448518996 | [163440, 104292, 19808] | [163498, 104348, 19806] | [163460, 104356, 19804] |
| 3 | 157378264 | 2020-11-04 07:38:27.332669+00:00 | NaN | t | 412 | 89374490395905686 | 864691136968109774 | 89374490395921430 | 864691135446953106 | [179218, 107132, 19372] | [179204, 107010, 19383] | [179196, 107072, 19380] |
| 4 | 148262628 | 2020-11-04 06:53:27.294021+00:00 | NaN | t | 3536 | 88189766885093187 | 864691136968109774 | 88189835604584343 | 864691135250533976 | [170154, 193170, 21123] | [170046, 193240, 21123] | [170118, 193220, 21128] |
Note that synapse queries always return the list of every synapse between the neurons in the query, even if there are multiple synapses between the same pair of neurons.
A common pattern to generate a list of connections between unique pairs of neurons is to group by the root ids of the presynaptic and postsynaptic neurons and then count the number of synapses between them. For example, to get the number of synapses from this neuron onto every other neuron, ordered
syn_df.groupby(
['pre_pt_root_id', 'post_pt_root_id']
).count()[['id']].rename(
columns={'id': 'syn_count'}
).sort_values(
by='syn_count',
ascending=False,
)
# Note that the 'id' part here is just a way to quickly extract one column.
# This could be any of the remaining column names, but `id` is often convenient because it is common to all tables.
| syn_count | ||
|---|---|---|
| pre_pt_root_id | post_pt_root_id | |
| 864691136968109774 | 864691135280056225 | 20 |
| 864691135456207722 | 16 | |
| 864691134949547516 | 15 | |
| 864691135784316467 | 13 | |
| 864691135884930672 | 11 | |
| ... | ... | |
| 864691135503112029 | 1 | |
| 864691135511524816 | 1 | |
| 864691135516460996 | 1 | |
| 864691135516672708 | 1 | |
| 864691135684221938 | 1 |
1035 rows × 1 columns
Querying Proofread neurons#
Proofread neurons#
Proofreading is necessary to obtain accurate reconstructions of a cell. Read more about proofreading and data quality here
The proofreading information for both MICrONS and V1DD is stored in a table called: proofreading_status_and_strategy.
proof_all_df = client.materialize.query_table("proofreading_status_and_strategy",
desired_resolution=[1, 1, 1],
split_positions=True)
proof_all_df["strategy_axon"].value_counts()
axon_partially_extended 1750
axon_fully_extended 267
axon_interareal 124
none 41
Name: strategy_axon, dtype: int64
Filtering Queries by proofreading status#
We can filter our query to only return rows that match a condition by adding a filter to our query:
proof_axon_df = client.materialize.query_table("proofreading_status_and_strategy",
filter_in_dict={"strategy_axon": ["axon_partially_extended", "axon_fully_extended", "axon_interareal"]},
desired_resolution=[1, 1, 1],
split_positions=True)
proof_axon_df.tail()
| id | created | superceded_id | valid | pt_position_x | pt_position_y | pt_position_z | valid_id | status_dendrite | status_axon | strategy_dendrite | strategy_axon | pt_supervoxel_id | pt_root_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2136 | 4002 | 2025-07-22 18:26:32.791280+00:00 | NaN | t | 703680.0 | 416064.0 | 907280.0 | 864691135783113040 | t | t | dendrite_extended | axon_fully_extended | 88951866083486311 | 864691135783113040 |
| 2137 | 3969 | 2025-07-22 18:26:32.196312+00:00 | NaN | t | 685568.0 | 697920.0 | 931040.0 | 864691136313908797 | t | t | dendrite_extended | axon_fully_extended | 88328030740880370 | 864691135928941780 |
| 2138 | 4003 | 2025-07-22 18:26:32.804932+00:00 | NaN | t | 638976.0 | 477440.0 | 867480.0 | 864691135493371743 | t | t | dendrite_extended | axon_fully_extended | 86702127719919273 | 864691135442616392 |
| 2139 | 3951 | 2025-07-22 18:26:31.880460+00:00 | NaN | t | 729600.0 | 551424.0 | 922920.0 | 864691135398557473 | t | t | dendrite_extended | axon_fully_extended | 89871195310210159 | 864691135360909656 |
| 2140 | 3999 | 2025-07-22 18:26:32.748278+00:00 | NaN | t | 801024.0 | 834304.0 | 879280.0 | 864691135271848613 | t | t | dendrite_extended | axon_fully_extended | 92343584443356753 | 864691135389004929 |
A more unified filter interface is available through a “table manager” interface.
Rather than passing a table name to the query_table function, client.materialize.tables has a subproperty for each table in the database that can be used to filter that table.
The general pattern for usage is
client.materialize.tables.{table_name}({filter options}).query({format and timestamp options})
where {table_name} is the name of the table you want to filter, {filter options} is a collection of arguments for filtering the query, and {format and timestamp options} are those parameters controlling the format and timestamp of the query.
With this, we can easily query all proofread cells with proofread axons:
proof_axon_df = client.materialize.tables.proofreading_status_and_strategy(
strategy_axon=["axon_partially_extended", "axon_fully_extended", "axon_interareal"]
).query(
select_columns=['pt_root_id','status_axon','status_dendrite','strategy_axon','strategy_dendrite'],
)
proof_axon_df.tail()
| pt_root_id | status_axon | status_dendrite | strategy_axon | strategy_dendrite | |
|---|---|---|---|---|---|
| 2136 | 864691135783113040 | t | t | axon_fully_extended | dendrite_extended |
| 2137 | 864691135928941780 | t | t | axon_fully_extended | dendrite_extended |
| 2138 | 864691135442616392 | t | t | axon_fully_extended | dendrite_extended |
| 2139 | 864691135360909656 | t | t | axon_fully_extended | dendrite_extended |
| 2140 | 864691135389004929 | t | t | axon_fully_extended | dendrite_extended |
From here, you can combine the proofreading information (indexed on pt_root_id) with either a Cell Types (matched on pt_root_id) or a Synapse table (matched on pre_pt_root_id for outputs of the cell, post_pt_root_id for inputs to the cell)