PapcoDoc/documents/DataSets
From PapcoWiki
Contents |
PaPCo’s Dataset Model
A dataset abstraction is introduced in papco 12, with several goals in mind. First, many of the modules read, store and plot datasets very similar in form. So we introduce a dataset abstraction that will greatly simplify such modules, making maintenance easier and module development quicker. Second, the dataset abstraction includes metadata (e.g. units) that allows a dataset coming from an unknown source to be used for science analysis. This idea of “data discovery” allows a papco module to utilize a new dataset without any new code.
Papco datasets are IDL structures with a number of required tags, inspired by cdf file conventions. The simplist dataset contains the tag “data,” which must be one-dimensional and contain monotonically increasing values:
xtags= { data:findgen(200)/10 }
This type of dataset serves only to be independent tags for another dataset. This dataset is a tag of the other dataset, and is identified by “depend_N” where N is the dimension number:
sineWave= { data:sin( x.data ), depend_0:’xtags’, xtags:xtags }
or:
energyTags= { data:findgen(30) )
z= { data:randomu( s, 200, 30 ), depend_0:’xtags’, depend_1:’energy’, $
xtags:xtags, energy:energyTags }
Any other tags are considered properties of the dataset. In addition, there are a number of properties used to label axes and identify fill data:
units, a string that identifies the units of the data. We’ll discuss units a little later. label, a string that is identifies the dataset, and is suitable for labling an axis. binWidth: a double or double array that identifies the length of the interval to which the measurement is relevant, or the resolution of the measurement. format: an IDL format specifier, such as "(f10.3)" scalemin, scalemax scaletyp, "linear" or "log"
units property
Units are strings that tag the doubles of a dataset to make them meaningful quantities. They are displayed on plot axes, of course, but they can also be used for automatic units conversion for overplots. In addition, a set of “time location units” is defined and these are used to precisely locate data on a time axis. For example, “mjd2000” indicates that the double indicates the number of decimal days elapsed since midnight Jan 1, 2000. The conventional papco unit strings are returned in a structure by the routine papco_units:
units= papco_units()
timeTags= { data:14+dindgen(86400)/86400., units:units.mjd2000 }
dataset plotters
papco_spectrogram, ds papco_lineplot, ds papco_radial_spectrogram, ds, radius="energy", angle="pitch"
dataset operators
sliceds= papco_ds_slice( ds, time=34 )
Slices the N dimensional data set at time index 34 to produce a N-1 dimensional dataset. The range of the slice is documented as a string in the property "time_range." This is analogous to slice= reform( z[34,*,*] )
energySpec= papco_ds_collapse( ds, 'pitch' )
Collapses the N dimensional data set by averaging over the pitch dimension, to return an N-1 dimensional dataset. The range of the pitch angles is returned in "pitch_range." This is analogous to z= total( z, 2 )
ds= papco_ds_trim( ds, time=[1000,2000], energy=[0,10] )
Trim the number of elements in a dimension. This is similar to z = z[1000:2000,*,*]
ds= papco_ds_transpose( ds )
Transpose the dataset dimensions. This is analogous to z= transpose(z)
idx= lindgen(n_elements(ds.time.data)/5) * 5 subds= papco_ds_array_index( ds, 0, idx )
Extracts a dataset using a subset of the indices.
Analogs to IDL Array Operators
Here's a table that tries to equate familiar IDL array operations with papco dataset equivalents. In these examples, suppose that ds is a rank 3 dataset of FLUX(time,energy,pitch). z is a 3-d IDL array, with time as the first index, energy as the second index and pitch as the third index. (Note that we could say z=ds.data.)
| array equiv | dataset equiv |
|---|---|
| reform(z[*,*,3]) | papco_ds_slice( ds, pitch=3 ) |
| total(z,3) | papco_ds_collapse( ds, 'pitch', /total ) |
| z[2:4,*,*] | papco_ds_trim( ds, time=[2,4] ) |
| z[[1,2,3],*,*] | papco_ds_array_index( ds, 0, [1,2,3] ) |
| transpose(z,[1,2,0]) | papco_ds_transpose( z, [ 1,2,0] ) |
| interpolate( z, findgen(23),findgen(3), findgen(5), /grid ) | papco_ds_grid( z, findgen(23),findgen(3), findgen(5), /interpolate ) |
| rebin( z, 4, 1 ) | papco_ds_rebin( z, [ 4, 1 ] ) |
| [ z, z ] | papco_ds_append( z, z ) |
| size(z) | papco_ds_size( z ) |
other functions
if ( not papco_ds_valid( ds ) ) then stop
Returns 1 if the dataset is well-formed, 0 otherwise.
properties= papco_ds_properties( ds )
Returns a structure containing just the properties of the dataset. If the dataset contains no properties then { name:’’} is returned.
idx= papco_ds_tagindex( ds, ‘bin_width’ )
Returns the index of the tag, or -1 if the tag is not present.
dataset builder
The routine papco_ds_builder is provided to make creating valid datasets easier.
bin_width= 0.8 log_energy_center= findgen(30)*bin_width+1.2 units= papco_ds_units() energy= papco_ds_builder( 10^log_energy_center, units=units.eV, $ label='energy', log=1, bin_width=bin_width ) bin_width=0.42D/86400 time= papco_ds_builder( 230+findgen(2000)*bin_width, units=units.mjd2000,$ label='time', bin_width=bin_width ) pitch= papco_ds_builder( 10*(findgen(18)+0.5), units=units.degrees, $ label='pitch angle', bin_width=10 ) input_data= papco_ds_builder( fltarr( 2000,30,18 ), label=data_label,$ depend_0='time', depend_1='energy', depend_2='pitch', $ time=time, energy=energy, pitch=pitch )

