next up previous contents
Next: 5.7 Caveat on system Up: 5. Behind the scenes Previous: 5.5 Slicing philosophy

Subsections

   
5.6 Read-routine philosophy

Traditionally files have been kept in a variety of time formats: Per orbit, per day, per month - whatever. Originally PaPCo was written with CRRES in mind, where everything went by orbit. In fact, that is still the only way you can access CRRES data - and for CRRES, it makes little sense to plot the data any other way.

For ISTP the decision has been made to have data in files per day. Consequently, there is the need for a generic read-routine that will take as input either a time range or an orbit number; this routine will then internally and automatically select and concatenate the required files and hand back a continuous data array of the required length.

PAPCO provides the time range required in the form of a common block in MJDT format (modified Julian Date plus seconds since start of that day, a structure), containing end and start time.

common mjdt, mjdt_start, mjdt_end

There are conversion routines in papco_XX\papco_lib\convert.pro to convert this to other time formats.

All that PaPCo will provide is this common block, and the orbit number which is passed by parameter. It's up to the user to provide a read-routine that can handle this input.

5.6.1 Making things configurable

Your read routine needs to know where the data is - in which directory, or even at which site. PaPCo provides some functionality for this. In each module there is a defaults.config file that contains a list of environmental variables and their default settings. These are used to contain the data paths needed by the read routine, and can be interactively modified through the module's panel editor (see Section 6.4.1). So it makes sense not to hard-code any paths that are site-dependent, but rather to use environmental variables.

5.6.2 Speed of reading data

Further, it is desirable to have data in a format which can be read as quickly as possible (this problem has already been discussed in Section 1.6.1).

5.6.3 Example: Reading Los Alamos geostationary data

A good example of how this problem is solved in practice, is the use of Los Alamos geostationary data (courtesy of Dick Belian and Geoff Reeves) which was used as part of a joint study. Here CRRES data was being plotted by orbit, and the need arose to have Los Alamos data for the corresponding period. Los Alamos data is supplied as ASCII files per day, which are then further compressed using gzip to save disk space.

The following procedure was adopted and implemented in code. Zipped raw data was kept in mass storage, and IDL-binary data for fast-read in a local directory. When the routine was called to return data for a given time period the following actions were performed:

1.
Use the time information to construct the file names of those files covering the interval.
2.
Check if those files are available as IDL-binaries. If not, check if they are available as ASCII files. If they are still zipped, unzip them. Read those files using a slow ASCII read-routine which at the same time makes a carbon copy as an IDL-binary (which in this case is a lot smaller than the original zipped ASCII).
3.
Once all required files are available as IDL-binaries, read them in succession and concatenate one data array containing the required time period of requested data.
4.
Return this array to PaPCo through a common block.

Using this procedure, only wrapper routines (using existing ASCII read-routines) had to be written for the Los Alamos data to be integrated into PaPCo. The whole data compatibility problem is solved by the simple method of reading slow once only, and then fast, using IDL-binaries. This does produce an overhead and needs extra disk space. In practice, we batch-process a given data set to produce the required IDL-binaries and archive the original data.

The procedure described here, however, is not prescribed. The user may do things any which way he/she wants. The current scenario has been adopted to maximize the speed of reading in large data sets - but any read-routine which presents to the corresponding plotting routine data for the requested time interval will do. Since both are user-written, PaPCo has no place in prescribing how things are done. All we can do is show how things have been done in PaPCo history and suggest reasons for why this might have been advantageous.

5.6.4 Data portability

With PaPCo use becoming more widespread, modules need to be written with portability to other platforms / architectures in mind.

One way of doing this is to use a data format which is portable - such as CDF or IDL savesets. At least the data which are made public through a PaPCo module should be portable - or the read routine in the module should be able to read the data no matter from which platform or architecture.

This is particularly important if the module makes uses of the remote get data facility, which should be encouraged.

5.6.5 Getting data remotely

As a further extension PaPCo now provides a routine to ``fetch'' data from a remote site using the GNU freeware program ``wget''. You can now write your read routine in such a way that if the data is not found locally, it is copied via ftp from a remote site (see Section C.2)!

This feature is under development and has so far only been implemented under UNIX. The GNU wGet facility was chosen as a ``vehicle'' because it's free, and available for most platforms. It has so far not been tested under VMS or Windows 95.

PaPCo provides a set of routines in papco_get_remote_data.pro under papco_XX/papco to interface with the wGet program and to provide status reports to PaPCo of data being downloaded. This interface depends on interrogating the wget log files produced. The version of wGet used is: GNU Wget/1.4.5 by Hrvoje Niksic <hniksic@srce.hr>

As this is considered to be an extremely powerful feature of PaPCo further development in this area will take place with the aim of full portability.

Instead of obtaining data via an ftp utility such as wGet the preferred way would be to provide remote mount points for data on your system, so that a remote site ``looks'' like a normal directory path which you can configure your module to use.


next up previous contents
Next: 5.7 Caveat on system Up: 5. Behind the scenes Previous: 5.5 Slicing philosophy
Reiner Friedel
1999-02-03