*Updates on 2021/12/16*

- I used
`R`

to generate the file format messages below. If you are using`ncdump`

or`python`

, you should reverse the dimension orders. For example,`Data`

would be`[num_flts, num_times, num_stations, num_parameters]`

. - For character-related variables, like
`ParameterNames`

and`StationNames`

, there are two storing options. They can be stored as a character matrix shown below, or they can be store as a string vector. In that case, the format would be`string StationNames(num_stations)`

.

## Introduction

Under the apps directory, there are several C++ programs that implements different phases of generating analog ensembles, including calculating standard deviations, calculating similarity metrics, and selecting analog forecasts, and some other programs for data pre-processing. Currently, all input and output files are in NetCDF format. This articles documents variables and dimensions expected in each file type based on the file type, for example, Forecasts, Observations, Similarity, and so on.

## File Types

The defined file types include:

- Forecasts
- Observations
- Analogs
- Similarity
- StandardDeviation
- Matrix

Each file type is associated with a list of expected dimensions and a list of expected variables. Those variables and dimensions are required to ensure the correctness and performance of C++ program. Some variables can also be very helpful during visualization.

### Forecasts

An example `Forecasts`

file includes the following content:

```
9 variables (excluding dimension variables):
char ParameterNames[num_chars,num_parameters] (Contiguous storage)
double ParameterWeights[num_parameters] (Contiguous storage)
char ParameterCirculars[num_chars,num_parameters] (Contiguous storage)
char StationNames[num_chars,num_stations] (Contiguous storage)
double Xs[num_stations] (Contiguous storage)
double Ys[num_stations] (Contiguous storage)
double Times[num_times] (Contiguous storage)
double FLTs[num_flts] (Contiguous storage)
double Data[num_parameters,num_stations,num_times,num_flts] (Contiguous storage)
5 dimensions:
num_parameters Size:17
num_chars Size:50
num_stations Size:262792
num_times Size:31
num_flts Size:53
```

**ParameterNames**are the names of each parameters in the forecasts.**ParameterWeights**are the corresponding weight for each parameter in the forecasts to be used when computing forecast similarity.**ParameterCirculars**are the names of the circular parameters.**StationNames**are the names of the forecast stations or grid points.**Xs**are the x coordinates of the forecast stations or grid points.**Ys**are the y coordinates of the forecast stations or grid points.**Times**are the time representation of forecasts. It is the number of seconds since the origin, 1970-01-01 00:00:00 UTC by default.**FLTs**are the time representation of forecast lead times. It is the number of seconds since the initialization of the forecast model.**Data**is a 4-dimensional array that stores the actual forecast values.

### Observations

An example `Observations`

file looks pretty much similar `Forecasts`

, except that the variable **Data** is a 3-dimensional array without forecast lead times.

```
8 variables (excluding dimension variables):
char ParameterNames[num_chars,num_parameters] (Contiguous storage)
double ParameterWeights[num_parameters] (Contiguous storage)
char ParameterCirculars[num_chars,num_parameters] (Contiguous storage)
char StationNames[num_chars,num_stations] (Contiguous storage)
double Xs[num_stations] (Contiguous storage)
double Ys[num_stations] (Contiguous storage)
double Times[num_times] (Contiguous storage)
double Data[num_parameters,num_stations,num_times] (Contiguous storage)
4 dimensions:
num_parameters Size:15
num_chars Size:50
num_stations Size:262792
num_times Size:496
```

### Analogs

An example `Analogs`

file includes the following content:

```
10 variables (excluding dimension variables):
double Analogs[num_stations,num_times,num_flts,num_members,num_cols] (Contiguous storage)
char StationNames[num_chars,num_stations] (Contiguous storage)
double Xs[num_stations] (Contiguous storage)
double Ys[num_stations] (Contiguous storage)
double Times[num_times] (Contiguous storage)
double FLTs[num_flts] (Contiguous storage)
char MemberStationNames[num_chars,member_num_stations] (Contiguous storage)
double MemberXs[member_num_stations] (Contiguous storage)
double MemberYs[member_num_stations] (Contiguous storage)
double MemberTimes[member_num_times] (Contiguous storage)
8 dimensions:
num_stations Size:10
num_times Size:100
num_flts Size:10
num_members Size:5
num_cols Size:3
num_chars Size:50
member_num_stations Size:10
member_num_times Size:1000
```

**Analogs**is a 5-dimensional array that stores analog forecasts. More information about analogs can be found at here.**FLTs**is the time representation of the analog forecasts. It is the number of seconds since the initialization of the forecast model.**StationNames**are the names of stations for analog forecasts.**Xs**are the x coordinates of stations for analog forecasts.**Ys**are the y coordinates of stations for analog forecasts.**Times**is the time representation of the analog forecasts. It is the number of seconds since the origin, 1970-01-01 00:00:00 UTC by default.**MemberStationNames**are the names of stations for analog members. This can be used together with the search station index in the fifth dimension to get the exact details of search station used.**MemberXs**are the x coordinates of stations for analog members. This can be used together with the search station index in the fifth dimension to get the exact details of search station used.**MemberYs**are the y coordinates of stations for analog members. This can be used together with the search station index in the fifth dimension to get the exact details of search station used.**MemberTimes**is the time representation of the search times. This can be used together with the search time index in the fifth dimension to know what historical time this member belongs to.

### Similarity

An example `Similarity`

file includes the following content:

```
13 variables (excluding dimension variables):
double SimilarityMatrices[num_cols,num_entries,num_flts,num_times,num_stations] (Contiguous storage)
char ParameterNames[num_chars,num_parameters] (Contiguous storage)
double ParameterWeights[num_parameters] (Contiguous storage)
char ParameterCirculars[num_chars,num_parameters] (Contiguous storage)
char StationNames[num_chars,num_stations] (Contiguous storage)
double Xs[num_stations] (Contiguous storage)
double Ys[num_stations] (Contiguous storage)
double Times[num_times] (Contiguous storage)
double FLTs[num_flts] (Contiguous storage)
char SearchStationNames[num_chars,search_num_stations] (Contiguous storage)
double SearchXs[search_num_stations] (Contiguous storage)
double SearchYs[search_num_stations] (Contiguous storage)
double SearchTimes[search_num_times] (Contiguous storage)
9 dimensions:
num_stations Size:10
num_times Size:100
num_flts Size:10
num_entries Size:100
num_cols Size:3
num_parameters Size:10
num_chars Size:50
search_num_stations Size:10
search_num_times Size:100
```

**SimilarityMatrices**is a 5-dimensional array that stores similarity metric values.**ParameterNames**are names of parameters used to calculate the similarity.**ParameterWeights**are weights of parameters used to calculate the similarity.**ParameterCirculars**are names of circular parameters.**StationNames**are names of stations or grid points for which similaity is generated.**Xs**are x coordinates of stations or grid points for which similaity is generated.**Ys**are y coordinates of stations or grid points for which similaity is generated.**Times**is the time representation of the similarity. It is the number of seconds since the origin, 1970-01-01 00:00:00 UTC by default.**FLTs**is the time representation of the similarity. It is the number of seconds since the initialization of the forecast model.**SearchTimes**are times for the complete search period. This can be used together with the search time index in the fifth dimension to know what historical forecast this similarity is generated from.**SearchStationNames**are stations names for the complete search data. This can be used together with the search station index in the fifth dimension to know what station/grid point is used to generate similarity.**SearchXs**are x coordinates for the complete search stations. This can be used together with the search station index in the fifth dimension to know what station/grid point is used to generate similarity.**SearchYs**are y coordinates for the complete search stations. This can be used together with the search station index in the fifth dimension to know what station/grid point is used to generate similarity.

### StandardDeviation

An example `StandardDeviation`

file includes the following content:

```
8 variables (excluding dimension variables):
double StandardDeviation[num_parameters,num_stations,num_flts] (Contiguous storage)
char ParameterNames[num_chars,num_parameters] (Contiguous storage)
double ParameterWeights[num_parameters] (Contiguous storage)
char ParameterCirculars[num_chars,num_parameters] (Contiguous storage)
char StationNames[num_chars,num_stations] (Contiguous storage)
double Xs[num_stations] (Contiguous storage)
double Ys[num_stations] (Contiguous storage)
double FLTs[num_flts] (Contiguous storage)
4 dimensions:
num_parameters Size:10
num_stations Size:10
num_flts Size:10
num_chars Size:50
```

**StandardDeviation**is a 3-dimensional array that stores standard deviation values.**ParameterNames**are the names of parameters.**ParameterWeights**are the weights of parameters.**ParameterCirculars**are the names of circular parameters.**StationNames**are the names of stations or grid points.**Xs**are the x coordinates of stations or grid points.**Ys**are the y coordinates of stations or grid points.**FLTs**are the forecast lead times.

### Matrix

File type `Matrix`

is designed for time mapping matrix between forecast times/forecast lead times and observation times. It is usually in text file format.

## References

All the example output is generated using R package ncdf4.

## Comments