Using Python to Access Neurodata Without Borders-type Files

Using Python to Access Neurodata Without Borders-type Files#

This tutorial will demonstrate the structure of the data file in a Neurodata Without Borders type (NWB) file and showcase how to access various parts of the data. We will use a specific example using an ecephys file, but the process is general.

We begin with package imports which we’ll need. hdmf_zarr handles the ZArray objects that appear in NWB files, while nwbwidgets will provide us with a useful tool down the line.

from hdmf_zarr import NWBZarrIO
from nwbwidgets import nwb2widget

%matplotlib inline

In order to do any analysis, of course, we’ll also need the usual libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Accessing an NWB file#

To manipulate the data in the file, we will first need to find the file we’re interested in and then load it into Python. We assume here that we have already received the datafile somehow and we know what directory we’ve stored the file in, so we will simply path directly to the file (which is of file type .nwb.zarr).

mouse_id = '655565'

nwb_file = '/data/SWDB 2024 CTLUT data/ecephys_655565_2023-03-31_14-47-36_nwb/ecephys_655565_2023-03-31_14-47-36_experiment1_recording1.nwb.zarr'

So far, all we’ve done is find the path and store it as a string in Python. Now that we have the NWB file’s path, we still need to load it into Python, which we can do using NWBZarrIO.

with NWBZarrIO(nwb_file, "r") as io:
    nwbfile_read = io.read()

This stores all the information contained in the NWB file into a NWBFile object from the pynwb package, and now we’re ready to start working with the data. The first thing we might be interested in is just seeing the contents of the NWB file, which we can do by using the nwb2widget tool. Below, a screenshot of the output is included to show what information it contains:

nwb2widget(nwbfile_read);
../../_images/output_nwb_read.PNG

The various types of data contained in the file when loaded into Python are simply attributes of the nwbfile_read object, which we access in the usual way nwbfile_read.*. Depending on the table of interest, they might be stored as different types of objects, however; for example, several are stored as objects specific to the pynwb package.

Now we will access the data directly.

print(nwbfile_read)
root pynwb.file.NWBFile at 0x140465601625488
Fields:
  devices: {
    ProbeA <class 'pynwb.device.Device'>,
    ProbeB <class 'pynwb.device.Device'>
  }
  electrode_groups: {
    ProbeA <class 'pynwb.ecephys.ElectrodeGroup'>,
    ProbeB <class 'pynwb.ecephys.ElectrodeGroup'>
  }
  electrodes: electrodes <class 'hdmf.common.table.DynamicTable'>
  epochs: epochs <class 'pynwb.epoch.TimeIntervals'>
  file_create_date: [datetime.datetime(2023, 5, 15, 21, 49, 33, 28522, tzinfo=tzutc())]
  identifier: ecephys_655565_2023-03-31_14-47-36
  intervals: {
    epochs <class 'pynwb.epoch.TimeIntervals'>,
    trials <class 'pynwb.epoch.TimeIntervals'>
  }
  processing: {
    behavior <class 'pynwb.base.ProcessingModule'>,
    ecephys <class 'pynwb.base.ProcessingModule'>
  }
  session_description: Neuropixels-OPTO recording of head-fixed mice during optogenetics stimulation for opto-tagging.
  session_start_time: 2023-03-31 14:47:36-07:00
  stimulus_template: {
    internal_blue-train-0.02mW <class 'pynwb.base.TimeSeries'>,
    internal_blue-train-0.04mW <class 'pynwb.base.TimeSeries'>,
    internal_blue-train-0.06mW <class 'pynwb.base.TimeSeries'>,
    internal_blue-train-0.08mW <class 'pynwb.base.TimeSeries'>,
    internal_blue-train-0.1mW <class 'pynwb.base.TimeSeries'>,
    internal_red-train-0.02mW <class 'pynwb.base.TimeSeries'>,
    internal_red-train-0.04mW <class 'pynwb.base.TimeSeries'>,
    internal_red-train-0.06mW <class 'pynwb.base.TimeSeries'>,
    internal_red-train-0.08mW <class 'pynwb.base.TimeSeries'>,
    internal_red-train-0.1mW <class 'pynwb.base.TimeSeries'>
  }
  subject: subject pynwb.file.Subject at 0x140465602056336
Fields:
  age: P158D
  age__reference: birth
  date_of_birth: 2022-10-24 00:00:00-07:00
  genotype: Chat-IRES-Cre-neo/Chat-IRES-Cre-neo
  sex: M
  species: Mus musculus
  subject_id: 655565

  timestamps_reference_time: 2023-03-31 14:47:36-07:00
  trials: trials <class 'pynwb.epoch.TimeIntervals'>
  units: units <class 'pynwb.misc.Units'>

Accessing Units Data#

Since this is an ecephys file, let’s first access the units. This is a DataFrame object that contains information about all of the sorted units in this experiment.

units = nwbfile_read.units[:]
print(type(units))
<class 'pandas.core.frame.DataFrame'>

Let’s examine what information is stored in the DataFrame:

print(units.columns.values)
['spike_times' 'electrodes' 'waveform_mean' 'waveform_sd' 'unit_name'
 'drift_ptp' 'snr' 'isi_violations_ratio' 'default_qc' 'ks_unit_id'
 'l_ratio' 'half_width' 'repolarization_slope' 'sliding_rp_violation'
 'drift_mad' 'peak_trough_ratio' 'presence_ratio' 'rp_contamination'
 'recovery_slope' 'num_spikes' 'isolation_distance' 'device_name'
 'rp_violations' 'amplitude' 'amplitude_cutoff' 'peak_to_valley'
 'firing_rate' 'drift_std' 'd_prime' 'noise_label' 'peak_channel'
 'internal_blue_train_best_site' 'internal_blue_train_mean_latency'
 'internal_blue_train_mean_jitter' 'internal_blue_train_mean_reliability'
 'internal_blue_train_num_sig_pulses_paired'
 'internal_blue_train_channel_diff' 'internal_red_train_best_site'
 'internal_red_train_mean_latency' 'internal_red_train_mean_jitter'
 'internal_red_train_mean_reliability'
 'internal_red_train_num_sig_pulses_paired'
 'internal_red_train_channel_diff' 'predicted_cell_type']

Here, we can see the full list of columns stored in our DataFrame:

  • spike_times - spike times (in seconds) for the entire session

  • electrodes - electrode indices used for mean waveform

  • waveform_mean - mean spike waveform across all electrodes

  • waveform_sd - standard deviation of the spike waveform across all electrodes

  • unit_name - universally unique identifier for this unit

  • default_qc - True if this unit passes default quality control criteria, False otherwise

  • presence_ratio - fraction of the session for which this unit was present (default threshold = 0.8)

  • num_spikes - total number of spikes detected

  • snr - signal-to-noise ratio of this unit’s waveform (relative to the peak channel)

  • isi_violations_ratio - metric that quantifies spike contamination (default threshold = 0.5)

  • firing_rate - average firing rate of the unit throughout the session

  • isolation_distance - distance to nearest cluster in Mahalanobis space (higher is better)

  • l_ratio - related to isolation distance; quantifies the probability of cluster membership for each spike

  • device_name - name of the probe that detected this unit

  • amplitude_cutoff - estimated fraction of missed spikes (default threshold = 0.1)

  • peak_trough_ratio - ratio of the max (peak) to the min (trough) amplitudes (based on 1D waveform)

  • repolarization_slope - slope of return to baseline after trough (based on 1D waveform)

  • amplitude - peak-to-trough distance (based on 1D waveform)

  • d_prime - classification accuracy based on LDA (higher is better)

  • recovery_slope - slope of return to baseline after peak (based on 1D waveform)

  • half_width - width of the waveform at half the trough amplitude (based on 1D waveform)

  • ks_unit_id - unit ID assigned by Kilosort

Note that the columns may vary from experiment to experiment, so it’s a good idea to check what columns are stored in any NWB file that you open. For instance, this file contains the above standard columns that appear in most NWB files from the Allen Institute, but also many more columns detailing the unit’s response to laser stimulation. These columns are further explained in the cell type lookup table tutorials.

To access data from a specific unit, we look at the row corresponding to that unit:

print(units.loc[418])
spike_times                                  [810.0001255327663, 869.9232313798699, 915.734...
electrodes                                   [768, 769, 770, 771, 772, 773, 774, 775, 776, ...
waveform_mean                                [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
waveform_sd                                  [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
unit_name                                                 f6e38bca-33ba-48cc-af02-75a27f42b7ed
drift_ptp                                                                                  NaN
snr                                                                                   6.655138
isi_violations_ratio                                                                       0.0
default_qc                                                                               False
ks_unit_id                                                                                31.0
l_ratio                                                                               0.149023
half_width                                                                            0.000243
repolarization_slope                                                             313290.923614
sliding_rp_violation                                                                       NaN
drift_mad                                                                                  NaN
peak_trough_ratio                                                                    -0.330413
presence_ratio                                                                        0.644231
rp_contamination                                                                           0.0
recovery_slope                                                                   -38637.486587
num_spikes                                                                               258.0
isolation_distance                                                                  141.030751
device_name                                                                             ProbeB
rp_violations                                                                              0.0
amplitude                                                                            69.265793
amplitude_cutoff                                                                           NaN
peak_to_valley                                                                        0.000607
firing_rate                                                                           0.041214
drift_std                                                                                  NaN
d_prime                                                                               4.067442
noise_label                                                                               good
peak_channel                                                                              38.0
internal_blue_train_best_site                                                              1.0
internal_blue_train_mean_latency                                                           NaN
internal_blue_train_mean_jitter                                                            NaN
internal_blue_train_mean_reliability                                                       0.0
internal_blue_train_num_sig_pulses_paired                                                  0.0
internal_blue_train_channel_diff                                                          28.0
internal_red_train_best_site                                                               1.0
internal_red_train_mean_latency                                                            NaN
internal_red_train_mean_jitter                                                             NaN
internal_red_train_mean_reliability                                                        0.0
internal_red_train_num_sig_pulses_paired                                                   0.0
internal_red_train_channel_diff                                                           28.0
predicted_cell_type                                                                   untagged
Name: 418, dtype: object

To look at the spike times for a specific unit, we look at the corresponding attribute of that unit.

print(units.loc[418]['spike_times'])
[ 810.00012553  869.92323138  915.73428283  993.46186793 1083.97800103
 1084.37176786 1089.61893294 1092.82656648 1096.59626237 1104.08294017
 1122.70730282 1125.34041942 1126.51636438 1129.27679993 1149.56591895
 1154.20855116 1166.86327015 1173.91335363 1176.37448709 1180.12824315
 1204.58823002 1218.85226241 1235.82063975 1235.83637457 1236.31450614
 1242.08294837 1244.65159236 1249.26466788 1249.44521834 1250.3503856
 1258.15654467 1258.88936281 1260.35690467 1260.36043833 1260.48044971
 1261.91264978 1261.91925041 1261.97372225 1262.13907127 1262.75761444
 1263.24219374 1263.75474236 1277.79954127 1282.3000847  1285.1322865
 1285.13595351 1292.74120776 1293.51769891 1297.33662557 1297.97231912
 1298.25296356 1298.98861539 1300.32024153 1301.9289663  1302.22299133
 1302.30006531 1307.71993123 1309.20901687 1309.72016916 1313.03775052
 1313.70558054 1320.65798737 1320.83285661 1324.20961025 1327.55279404
 1328.05610845 1342.03298021 1359.71812328 1364.14748767 1364.89158035
 1365.69926807 1366.36942043 1371.50374046 1375.13657435 1376.17882736
 1383.53526109 1389.51088072 1392.3238807  1394.06714595 1398.07691706
 1403.59520718 1405.08142334 1423.03552524 1423.7037959  1429.79206571
 1430.82366349 1431.01634842 1432.52099105 1433.25349382 1439.23120009
 1450.29651494 1451.11632593 1452.02093948 1455.97412463 1458.76341698
 1459.3572349  1461.10636737 1461.22491681 1461.800438   1467.17078004
 1468.11733638 1475.5309053  1477.01933862 1483.13582225 1484.99670197
 1488.51950543 1495.57367055 1505.52824486 1512.58018    1513.52840468
 1526.67538277 1527.18216414 1536.1034098  1544.84723864 1546.43738802
 1547.5140566  1551.91157495 1570.22131056 1570.83133279 1572.09928858
 1574.95555679 1579.42628029 1598.73067968 1600.77873627 1601.49757529
 1608.83166576 1612.36129514 1613.34382651 1613.56541417 1618.98289402
 1619.33949446 1627.02318305 1627.26987307 1629.81415336 1629.81755368
 1632.21151379 1637.66412367 1639.55353586 1650.90761066 1651.40935815
 1652.60833832 1654.73333947 1663.87067942 1686.98349232 1687.56674753
 1706.13920563 1712.22841537 1712.59144974 1721.0698736  1729.05084114
 1746.64462621 1748.49174811 1770.8974024  1776.49069853 1780.78503838
 1791.22689348 1794.01994108 1807.25139273 1824.05411554 1860.79957934
 1864.15286343 1865.37295748 1870.81732763 1871.24219086 1874.10090512
 1877.87989618 1890.09756301 1893.42628795 1937.31827045 1940.88666747
 1958.08006057 1966.77187749 1984.55029789 1989.71185282 1994.12430356
 2035.57325862 2079.8236104  2152.69883064 2166.64304108 2176.93501333
 2179.53132526 2204.94333605 2216.45245735 2246.74960844 2312.93004417
 2323.66347676 2332.03116466 2333.87475743 2342.75704456 2469.84298321
 2496.84616332 2573.61715125 2575.35928427 2582.88982898 2597.59004854
 2627.90130553 2629.5536326  2681.47160829 2690.96032946 2728.1732424
 2774.93925716 2800.31415257 2897.06914162 2905.85971618 2993.41622006
 3047.91182762 3138.28118603 3245.08149447 3250.41785293 3363.26069072
 3387.77985606 3405.24801557 3407.52443017 3441.21304865 3512.37734341
 3545.65351588 3602.1433092  3617.53372792 3703.08445952 3760.20331083
 3762.32128877 3813.95522977 3894.53797467 3958.05176217 4154.07333751
 4173.0556874  4180.58816869 4554.33138094 4572.91709499 4593.90613711
 4611.37661788 4681.01370214 4712.23169268 4847.82922662 4864.65831202
 5136.32570464 5374.39409871 5411.34026029 5411.58945041 5507.19160087
 5637.8718524  5649.97855918 5691.0316201  5722.23669494 5833.59971206
 6002.35099841 6015.55426634 6027.48486453 6059.7740764  6132.44734576
 6150.05200268 6233.31800264 6237.41204837 6319.09460545 6400.86718164
 6529.5848741  6636.06779153 6678.74083936]

We can also see how many spike times there are for a specific unit, which will tell us the number of spikes and allow us to access the time of a specific spike (in this case, accessing the time of the 189th spike):

print(len(units.loc[418]['spike_times']))
print(units.loc[418]['spike_times'][188])
258
2342.7570445621095

A useful sanity check here might also to be to ensure that the number of spike times matches the actual number of spikes recorded in the units table:

print(units.loc[418]['num_spikes'])
print(units.loc[418]['num_spikes'] == len(units.loc[418]['spike_times']))
258.0
True

So we can see that every recorded spike also has its time recorded, as we would expect. As another example, it is also possible to access and plot the waveform for a specific unit. Accessing the waveform is done in the same manner as accessing the spike times; in this case, we are interested in looking at ‘waveform_mean’ (note that here, we access it in two different ways: as an attribute of the units table, and as an element of the DataFrame). This is stored as a basic Python array. We can take a look at its dimensions to make sense of what data is stored here:

print(np.array(units.loc[418]['waveform_mean']).shape)
(210, 384)

Equipped with our knowledge of Neuropixels, we can see then that the dimensions are (time, channel). We can plot this to visualize each channel’s recordings over time. Note that due to the way that plt.imshow functions, we will need to take the transpose of the array to plot time on the x-axis. Also note that waveform_mean centers the times around the time of the spike.

plt.imshow(units.loc[418]['waveform_mean'].T, origin='lower')
plt.ylabel("Electrode")
plt.xlabel("Time (us)")
plt.show()
../../_images/2314d6b6fa782d4ba874954a1479b0d90198a6bfa9675cdb667f91f2f13bf488.png

From above, we can see that most of the channels recorded the same values over time; the channel’s recording at any given time is indicated by the color of the plot at that time. Since most horizontal lines we can draw on this plot are monochromatic, we can conclude that the channel in question did not record any interesting variation in electric potential.

However, there is something interesting happening on channels 20 to 50!

We can plot the shape of the potential over time on those channels. Now, instead of a three-dimensional plot, we will have a two-dimensional plot of voltage over time, with each distinct line representing a different channel:

plt.plot(units.waveform_mean.loc[418][:,20:50])
plt.ylabel("uV")
plt.xlabel("Time (us)")
plt.show()
../../_images/66f0073f2a36d8e119eefe078574ee34424052e162150e2d8e776fff6d9b0cd2.png

This has somthing interesting going on; we can clearly see the classic shape of an action potential, but it appears to be overlaid on some baseline sinusoidal oscillation in the voltage.

Moving on from the units table, let’s examine more of the data contained in the NWB file.

Accessing Trials Data#

The trials table includes information on the timing and experimental parameters for each trial conducted in the experiment. It can be accessed in the same way as the units table; again, this is a DataFrame object.

trials = nwbfile_read.trials[:]
print(trials.columns.values)
['start_time' 'stop_time' 'site' 'power' 'param_group' 'emission_location'
 'duration' 'rise_time' 'num_pulses' 'wavelength' 'type'
 'inter_pulse_interval' 'stimulus_template_name']

Notice that this is an example of an optotagging experiment from the Cell Type Look-Up Table (CTLUT) dataset, so the columns stored here are specific to that stimulus. Other NWB files for other experiments will have different column names pertaining to the stimulus or behavior used for that experiment. In this experiment, for example, we can see the length of the trials and the details on the lasers used as a stimulus.

Like with the units table, we can access a particular trial by asking for the row number (remember that these are indexed starting at 0):

print(trials.loc[1582])
start_time                             1696.596233
stop_time                              1696.646233
site                                            13
power                                          0.1
param_group                                  train
emission_location                           ProbeA
duration                                      0.01
rise_time                                    0.001
num_pulses                                       5
wavelength                                     638
type                                  internal_red
inter_pulse_interval                          0.04
stimulus_template_name    internal_red-train-0.1mW
Name: 1582, dtype: object

Stimulus Template Information#

A stimulus template provides an exemplar for each stimulus condition, which we can access again as an attribute of the NWB file. This is also something that is tailored to the experimental design, so the form can vary. In this example, the template is a Dictionary object. The keys are:

stimulus_template = nwbfile_read.stimulus_template
print(stimulus_template.keys())
dict_keys(['internal_blue-train-0.02mW', 'internal_blue-train-0.04mW', 'internal_blue-train-0.06mW', 'internal_blue-train-0.08mW', 'internal_blue-train-0.1mW', 'internal_red-train-0.02mW', 'internal_red-train-0.04mW', 'internal_red-train-0.06mW', 'internal_red-train-0.08mW', 'internal_red-train-0.1mW'])

Notice that these keys match what we find in the stimulus_table under stimulus_template_name. For this experiment, the template provides the shape of the optotagging pulses delivered during the experiment.

Since the trial we pulled above uses the ‘internal_red-train-0.1mW’ stimulus template, let’s just pull up the information for that particular stimulus template. Note that we could print the whole dictionary (this will be a very long output), but the information stored in each stimulus_template is the same.

print(stimulus_template['internal_red-train-0.1mW'])
internal_red-train-0.1mW pynwb.base.TimeSeries at 0x140465602055568
Fields:
  comments: no comments
  conversion: 1.0
  data: <zarr.core.Array '/stimulus/templates/internal_red-train-0.1mW/data' (7800,) float32 read-only>
  description: Analog signal to drive laser stimulation
  interval: 1
  offset: 0.0
  resolution: -1.0
  timestamps: <zarr.core.Array '/stimulus/templates/internal_red-train-0.1mW/timestamps' (7800,) float64 read-only>
  timestamps_unit: seconds
  unit: V

Here, we can see all the information stored about the stimulus used. In particular, we can see that we have 7800 data points, stored in two different arrays: one in “data”, and one in “timestamps”. We can also see that the units of our dataset are in seconds and Volts (important so we don’t make embarrassing orders of magnitude errors!).

Now let’s plot the stimulus template.

plt.plot(stimulus_template['internal_red-train-0.1mW'].timestamps[:], stimulus_template['internal_red-train-0.1mW'].data[:])
plt.xlabel("Time (s)", fontsize=16)
plt.ylabel("Voltage (V)", fontsize=16)
plt.show()
../../_images/45bf0bff24cb14298ae41b94a9dc12c40a69854ba0d8e80d52b51f31036cbcbb.png

Here, we can see that our stimulus is a square wave with a frequency of 20 Hz and an amplitude of just under 0.5 Volts.

Processing#

There are some relevant pieces of data which are not quite as straightforward to access; some are nested under other attributes. For example, the running speed of the mouse over time is very relevant, but it is nested behind the processing attribute. To see the data structure, we can just use the nwb2widget tool that we showed above in the fifth cell and browse, but here we’ll also show how to read the data manually since the nwb2widget tool can sometimes be slow if there’s a large amount of information in a section.

The processing attribute holds a variety of important pieces of data. processing is a LabelledDict type object like the stimulus templates were. In order to see what the keys are for this LabelledDict, we can just print it and see what attributes we can access:

print(nwbfile_read.processing)
{'behavior': behavior pynwb.base.ProcessingModule at 0x140465601919488
Fields:
  data_interfaces: {
    BehavioralTimeSeries <class 'pynwb.behavior.BehavioralTimeSeries'>
  }
  description: Processed behavioral data
, 'ecephys': ecephys pynwb.base.ProcessingModule at 0x140465601625776
Fields:
  data_interfaces: {
    LFP <class 'pynwb.ecephys.LFP'>
  }
  description: Intermediate data from extracellular electrophysiology recordings, e.g., LFP.
}

As we can see for this example, the keys contained in processing for this mouse are ‘behavior’ and ‘ecephys’. As with the previous sections, it is important to remember that the keys we have here are specific to this optotagging experiment! In general, what information is contained in processing will be experiment-dependent.

Now, let’s access the ‘behavior’ key first to see what it contains:

print(nwbfile_read.processing['behavior'])
behavior pynwb.base.ProcessingModule at 0x140465601919488
Fields:
  data_interfaces: {
    BehavioralTimeSeries <class 'pynwb.behavior.BehavioralTimeSeries'>
  }
  description: Processed behavioral data

Notice here that in order to access a specific key, we simply ask for the name of the key.

Now, when we read this, we can see the ‘behavior’ part of the processing module for this particular experiment contains only one object, data_interfaces. Let’s keep going and see what that contains:

print(nwbfile_read.processing['behavior'].data_interfaces)
{'BehavioralTimeSeries': BehavioralTimeSeries pynwb.behavior.BehavioralTimeSeries at 0x140465601919536
Fields:
  time_series: {
    linear velocity <class 'pynwb.base.TimeSeries'>
  }
}

Notice that to access the subsection of the dictionary key, we call it as an attribute of the key.

In this case the behavior object has one item: the running speed of the mouse during the experiment. This is called the “linear velocity”. In other experiments, we might find the pupil area and/or position here as well.

We can that the linear velocity object by reading through the nested sets printed in our last key. The data_interfaces object possesses a key named BehavioralTimeSeries, which has an attribute time_series inside it. In turn, the attribute time_series possesses the key named linear velocity.

So, the running speed is nested in several layers of LabelledDict type objects. In order to access it, we’ll need to write down the whole “path” to it. Let’s see what information the ‘linear velocity’ key contains.

(For the sake of simplicity, we’ll store the linear velocity data in a different Python object, so that we don’t have to type the full path to the linear velocity every time we wish to examine it.)

running_speed = nwbfile_read.processing['behavior'].data_interfaces['BehavioralTimeSeries'].time_series['linear velocity']
print(running_speed)
linear velocity pynwb.base.TimeSeries at 0x140465601919680
Fields:
  comments: no comments
  conversion: 1.0
  data: <zarr.core.Array '/processing/behavior/BehavioralTimeSeries/linear velocity/data' (624796,) float64 read-only>
  description: The speed of the subject measured over time.
  interval: 1
  offset: 0.0
  resolution: -1.0
  timestamps: <zarr.core.Array '/processing/behavior/BehavioralTimeSeries/linear velocity/timestamps' (624796,) float64 read-only>
  timestamps_unit: seconds
  unit: m/s

This tells us that the ‘linear_velocity’ (which here we’ve stored in running_speed for ease of access) contains several attributes: comments, conversion, data, description, interval, offset, resolution, timestamps, timestamps_unit, and unit. Of these, most are irrelevant for our purposes; the two that contain all the data are:

  • data contains the actual velocity at each timestamp.

  • timestamps contains the timestamp for each measurement. These are naturally paired.

We can see that there are 624,796 data points, and that our data is in units of velocity over time. We can access them directly if we want:

print(running_speed.timestamps[:])
print(running_speed.data[:])
[ 454.87926667  454.88925067  454.89926667 ... 6702.80925067 6702.81926667
 6702.82925067]
[ 0.09047513  0.09920467  0.08142761 ... -0.02705582 -0.03619005
 -0.05411164]

And to access a specific data point, we would just use the index in place of the : that tells Python to read all elements of the array.

Now let’s plot the data and time to get an idea of the mouse’s velocity over the course of the experiment:

plt.plot(running_speed.timestamps[:], running_speed.data[:])
plt.xlabel("Time (s)", fontsize=16)
plt.ylabel("Velocity (m/s)", fontsize=16)
plt.show()
../../_images/f844fc50a8af84da2798ac225b4664f906728717a6e47a21260a3dfc1d802fec.png

This isn’t the most legible plot, but it’s clear why it isn’t; we have on the order of \(10^5\) measurements, but our time scale is only \(10^3\) s, which means we’re taking around a hundred measurements per second. We can zoom in on a specific time interval to get a better idea of what’s going on in that time interval. We could also apply standard data processing techniques to clean up the signal to improve legibility, but those techniques are beyond the scope of this particular tutorial.

plt.plot(running_speed.timestamps[3000:3500], running_speed.data[3000:3500])
plt.xlabel("Time (s)", fontsize=16)
plt.ylabel("Velocity (m/s)", fontsize=16)
plt.show()
../../_images/33adbad3050eab91be25dcdf260622b78f224b5503a4470891c18552005163d3.png

Now let’s back up. There was a second key named ecephys which was contained in the processing attribute of nwbfile_read. Let’s access that and see what data is contained there:

print(nwbfile_read.processing['ecephys'])
ecephys pynwb.base.ProcessingModule at 0x140465601625776
Fields:
  data_interfaces: {
    LFP <class 'pynwb.ecephys.LFP'>
  }
  description: Intermediate data from extracellular electrophysiology recordings, e.g., LFP.

The ecephys object contains one item: the local field potential (LFP) data. LFPs are the signals in the low frequency range (below 500 Hz) that are generated by synchronized synaptic inputs of neuronal populations. This features are known to be correlated with brain and behavioral states. The LFP is computed for each probe independently, so when there are multiple probes, there should be an LFP object for each probe.

In this particular experiment, the path to the LFP data goes through the attribute data_interfaces. Again, for ease of access, we’ll store it in an object so that we don’t have to type out the path every time:

lfp_interface = nwbfile_read.processing['ecephys'].data_interfaces['LFP']
print(lfp_interface)
LFP pynwb.ecephys.LFP at 0x140465601919200
Fields:
  electrical_series: {
    ElectricalSeriesProbeA-LFP <class 'pynwb.ecephys.ElectricalSeries'>,
    ElectricalSeriesProbeB-LFP <class 'pynwb.ecephys.ElectricalSeries'>
  }

Looking at the contents of the LFP data, we have two probes listed. As expected, each one of them contains LFP data. Let’s take a look at one of them to start:

lfp_es = lfp_interface.electrical_series['ElectricalSeriesProbeA-LFP']
print(lfp_es)
ElectricalSeriesProbeA-LFP pynwb.ecephys.ElectricalSeries at 0x140465601624864
Fields:
  comments: no comments
  conversion: 4.679999828338622e-06
  data: <zarr.core.Array '/processing/ecephys/LFP/ElectricalSeriesProbeA-LFP/data' (15650092, 384) int16 read-only>
  description: LFP voltage traces from ProbeA
  electrodes: electrodes <class 'hdmf.common.table.DynamicTableRegion'>
  offset: 0.0
  rate: 2500.0
  resolution: -1.0
  starting_time: 447.9456
  starting_time_unit: seconds
  unit: volts

Reading off the information about the data recorded by Probe A, we can see that we only have one set of data here. There, we have 15,661,294 data points for each of the 384 channels on the Neuropixels probe. None of this data is directly associated with a time coordinate. Instead, we’re told the sampling rate of 2500 Hz and the starting time. Thus, if we want to plot this, we’ll have to construct the timestamps manually by taking the starting time and then appending times at 1/2500 s intervals until we have 15,661,294 timestamps.

Fortunately, this is a task computers excel at:

lfp_num_samples = lfp_es.data.shape[0]
lfp_timestamps = lfp_es.starting_time + np.arange(lfp_num_samples) / lfp_es.rate

Next we can examine what’s contained in the data. Here, we’ll pick a specific interval instead of plotting for the entire experiment to enhance legibility. (As with the previous section, we would still need to do a significant amount of signal processing to analyze this data).

plt.imshow(lfp_es.data[50000:60000,0:350].T, origin='lower', aspect='auto')
plt.ylabel('Channel')
plt.xlabel('Sample')
Text(0.5, 0, 'Sample')
../../_images/135ffd220414b1a5389bc0bbb20ff15a54aad7c8ea64e42e7867abdb277ff8c1.png

Here, we are not plotting channels above 350, as they were outside of the brain and thus did not have meaningful data. In this time period, we can see some clear oscillatory activity in channels 200-350, which were likely in the cortex.

Instead of looking at all 384 channels, we can also examine just a single channel, and we can plot it against time to see what the LFP looked like for that channel.

channel = 99
raw_data = lfp_es.data[:, channel]

print(raw_data.shape)

plt.plot(lfp_timestamps[50000:60000], raw_data[50000:60000])
plt.xlabel("Time (s)", fontsize=16)
plt.ylabel("LFP (uV)", fontsize=16)
plt.show()
(15650092,)
../../_images/fa2c7265dcc64d83120e743355811664e22541fe483b57c122289e99baa447af.png

Finally, we can also examine the information contained in the electrodes attribute of the probe we’re investigating:

print(lfp_es.electrodes.shape)
(384, 10)

Here, we can see that we have ten entries for every channel.

The NWB file contains other data as well which can be explored using the methods we’ve detailed in this tutorial; we will not detail every piece of accessible information here, but hopefully it should be clear from these examples how one can navigate and extract data from an NWB file. Once finished, we should then close the NWB file.

io.close()