Updates on 2021/12/16
- I used
R
to generate the file format messages below. If you are usingncdump
orpython
, you should reverse the dimension orders. For example,Data
would be[num_flts, num_times, num_stations, num_parameters]
. - For character-related variables, like
ParameterNames
andStationNames
, there are two storing options. They can be stored as a character matrix shown below, or they can be store as a string vector. In that case, the format would bestring StationNames(num_stations)
.
Introduction
Under the apps directory, there are several C++ programs that implements different phases of generating analog ensembles, including calculating standard deviations, calculating similarity metrics, and selecting analog forecasts, and some other programs for data pre-processing. Currently, all input and output files are in NetCDF format. This articles documents variables and dimensions expected in each file type based on the file type, for example, Forecasts, Observations, Similarity, and so on.
File Types
The defined file types include:
- Forecasts
- Observations
- Analogs
- Similarity
- StandardDeviation
- Matrix
Each file type is associated with a list of expected dimensions and a list of expected variables. Those variables and dimensions are required to ensure the correctness and performance of C++ program. Some variables can also be very helpful during visualization.
Forecasts
An example Forecasts
file includes the following content:
9 variables (excluding dimension variables):
char ParameterNames[num_chars,num_parameters] (Contiguous storage)
double ParameterWeights[num_parameters] (Contiguous storage)
char ParameterCirculars[num_chars,num_parameters] (Contiguous storage)
char StationNames[num_chars,num_stations] (Contiguous storage)
double Xs[num_stations] (Contiguous storage)
double Ys[num_stations] (Contiguous storage)
double Times[num_times] (Contiguous storage)
double FLTs[num_flts] (Contiguous storage)
double Data[num_parameters,num_stations,num_times,num_flts] (Contiguous storage)
5 dimensions:
num_parameters Size:17
num_chars Size:50
num_stations Size:262792
num_times Size:31
num_flts Size:53
- ParameterNames are the names of each parameters in the forecasts.
- ParameterWeights are the corresponding weight for each parameter in the forecasts to be used when computing forecast similarity.
- ParameterCirculars are the names of the circular parameters.
- StationNames are the names of the forecast stations or grid points.
- Xs are the x coordinates of the forecast stations or grid points.
- Ys are the y coordinates of the forecast stations or grid points.
- Times are the time representation of forecasts. It is the number of seconds since the origin, 1970-01-01 00:00:00 UTC by default.
- FLTs are the time representation of forecast lead times. It is the number of seconds since the initialization of the forecast model.
- Data is a 4-dimensional array that stores the actual forecast values.
Observations
An example Observations
file looks pretty much similar Forecasts
, except that the variable Data is a 3-dimensional array without forecast lead times.
8 variables (excluding dimension variables):
char ParameterNames[num_chars,num_parameters] (Contiguous storage)
double ParameterWeights[num_parameters] (Contiguous storage)
char ParameterCirculars[num_chars,num_parameters] (Contiguous storage)
char StationNames[num_chars,num_stations] (Contiguous storage)
double Xs[num_stations] (Contiguous storage)
double Ys[num_stations] (Contiguous storage)
double Times[num_times] (Contiguous storage)
double Data[num_parameters,num_stations,num_times] (Contiguous storage)
4 dimensions:
num_parameters Size:15
num_chars Size:50
num_stations Size:262792
num_times Size:496
Analogs
An example Analogs
file includes the following content:
10 variables (excluding dimension variables):
double Analogs[num_stations,num_times,num_flts,num_members,num_cols] (Contiguous storage)
char StationNames[num_chars,num_stations] (Contiguous storage)
double Xs[num_stations] (Contiguous storage)
double Ys[num_stations] (Contiguous storage)
double Times[num_times] (Contiguous storage)
double FLTs[num_flts] (Contiguous storage)
char MemberStationNames[num_chars,member_num_stations] (Contiguous storage)
double MemberXs[member_num_stations] (Contiguous storage)
double MemberYs[member_num_stations] (Contiguous storage)
double MemberTimes[member_num_times] (Contiguous storage)
8 dimensions:
num_stations Size:10
num_times Size:100
num_flts Size:10
num_members Size:5
num_cols Size:3
num_chars Size:50
member_num_stations Size:10
member_num_times Size:1000
- Analogs is a 5-dimensional array that stores analog forecasts. More information about analogs can be found at here.
- FLTs is the time representation of the analog forecasts. It is the number of seconds since the initialization of the forecast model.
- StationNames are the names of stations for analog forecasts.
- Xs are the x coordinates of stations for analog forecasts.
- Ys are the y coordinates of stations for analog forecasts.
- Times is the time representation of the analog forecasts. It is the number of seconds since the origin, 1970-01-01 00:00:00 UTC by default.
- MemberStationNames are the names of stations for analog members. This can be used together with the search station index in the fifth dimension to get the exact details of search station used.
- MemberXs are the x coordinates of stations for analog members. This can be used together with the search station index in the fifth dimension to get the exact details of search station used.
- MemberYs are the y coordinates of stations for analog members. This can be used together with the search station index in the fifth dimension to get the exact details of search station used.
- MemberTimes is the time representation of the search times. This can be used together with the search time index in the fifth dimension to know what historical time this member belongs to.
Similarity
An example Similarity
file includes the following content:
13 variables (excluding dimension variables):
double SimilarityMatrices[num_cols,num_entries,num_flts,num_times,num_stations] (Contiguous storage)
char ParameterNames[num_chars,num_parameters] (Contiguous storage)
double ParameterWeights[num_parameters] (Contiguous storage)
char ParameterCirculars[num_chars,num_parameters] (Contiguous storage)
char StationNames[num_chars,num_stations] (Contiguous storage)
double Xs[num_stations] (Contiguous storage)
double Ys[num_stations] (Contiguous storage)
double Times[num_times] (Contiguous storage)
double FLTs[num_flts] (Contiguous storage)
char SearchStationNames[num_chars,search_num_stations] (Contiguous storage)
double SearchXs[search_num_stations] (Contiguous storage)
double SearchYs[search_num_stations] (Contiguous storage)
double SearchTimes[search_num_times] (Contiguous storage)
9 dimensions:
num_stations Size:10
num_times Size:100
num_flts Size:10
num_entries Size:100
num_cols Size:3
num_parameters Size:10
num_chars Size:50
search_num_stations Size:10
search_num_times Size:100
- SimilarityMatrices is a 5-dimensional array that stores similarity metric values.
- ParameterNames are names of parameters used to calculate the similarity.
- ParameterWeights are weights of parameters used to calculate the similarity.
- ParameterCirculars are names of circular parameters.
- StationNames are names of stations or grid points for which similaity is generated.
- Xs are x coordinates of stations or grid points for which similaity is generated.
- Ys are y coordinates of stations or grid points for which similaity is generated.
- Times is the time representation of the similarity. It is the number of seconds since the origin, 1970-01-01 00:00:00 UTC by default.
- FLTs is the time representation of the similarity. It is the number of seconds since the initialization of the forecast model.
- SearchTimes are times for the complete search period. This can be used together with the search time index in the fifth dimension to know what historical forecast this similarity is generated from.
- SearchStationNames are stations names for the complete search data. This can be used together with the search station index in the fifth dimension to know what station/grid point is used to generate similarity.
- SearchXs are x coordinates for the complete search stations. This can be used together with the search station index in the fifth dimension to know what station/grid point is used to generate similarity.
- SearchYs are y coordinates for the complete search stations. This can be used together with the search station index in the fifth dimension to know what station/grid point is used to generate similarity.
StandardDeviation
An example StandardDeviation
file includes the following content:
8 variables (excluding dimension variables):
double StandardDeviation[num_parameters,num_stations,num_flts] (Contiguous storage)
char ParameterNames[num_chars,num_parameters] (Contiguous storage)
double ParameterWeights[num_parameters] (Contiguous storage)
char ParameterCirculars[num_chars,num_parameters] (Contiguous storage)
char StationNames[num_chars,num_stations] (Contiguous storage)
double Xs[num_stations] (Contiguous storage)
double Ys[num_stations] (Contiguous storage)
double FLTs[num_flts] (Contiguous storage)
4 dimensions:
num_parameters Size:10
num_stations Size:10
num_flts Size:10
num_chars Size:50
- StandardDeviation is a 3-dimensional array that stores standard deviation values.
- ParameterNames are the names of parameters.
- ParameterWeights are the weights of parameters.
- ParameterCirculars are the names of circular parameters.
- StationNames are the names of stations or grid points.
- Xs are the x coordinates of stations or grid points.
- Ys are the y coordinates of stations or grid points.
- FLTs are the forecast lead times.
Matrix
File type Matrix
is designed for time mapping matrix between forecast times/forecast lead times and observation times. It is usually in text file format.
References
All the example output is generated using R package ncdf4.
Comments