RAnEn::formatObservations generates the observation list required by analog computation.

formatObservations(
  df,
  col.par,
  col.x,
  col.y,
  col.time,
  time.series,
  col.value,
  verbose = T,
  preview = 2,
  remove.duplicates = T,
  circular.pars = NULL,
  col.station.name = NULL,
  show.progress = F,
  sort.stations = NULL
)

Arguments

df

A data frame to be converted to an R list.

col.par

The column name for parameter names.

col.x

The column name for station x coordinates.

col.y

The column name for station y coordinates.

col.time

The column name for times. The column should be POSIXct.

time.series

The times to be extract into observations. This should be a POSIXct vector.

col.value

The column name for data values.

verbose

Whether to print progress messages.

preview

How many entries to preview in progress messages.

remove.duplicates

Whether to remove redundant values associated with the same time. Sometimes, it is possible that, due to equipment mulfunctions, there are multiple measurements at the same time. Idealy this should cleaned prior to this function, but this function is able to keep the first appearance and remove the rest.

circular.pars

A character vector for the circular parameter names.

col.station.name

The column name for station names.

show.progress

Whether to show a progress bar.

sort.stations

Sort station. It can be Xs, Ys, or StationNames if it is set.

Value

An R list for observation data.

Details

The observation list is an R list with members including ParameterNames, Xs, Ys, Data, and etc., with a full list accessible here. RAnEn::formatObservations make it easier to convert a data frame into a such list data structure.

I read this tutorial when developing functions with dplyr. It is very informative of using variables with dplyr functions.

Examples

if (FALSE) {
# How to download this file? Please see the tutorial
# https://github.com/Weiming-Hu/AnalogsEnsemble/blob/master/RAnalogs/examples/demo-5_observation-conversion.Rmd
#
obs <- read.table('~/Desktop/hourly_44201_2019.csv',
                  sep = ',', quote = '"', header = T, stringsAsFactors = F)

# Sample data
df <- obs[sample(nrow(obs), floor(nrow(obs) * 0.01)), ]

# Create a POSIXct time
df$POSIX <- as.POSIXct(
  paste(df$Date.GMT, df$Time.GMT),
  format = '%Y-%m-%d %H:%M', tz = 'UTC')

# Create unique station names
df$StationName <- paste(
  df$State.Name, df$County.Name, df$Site.Num, sep = '-')

# Create a target time series
time.series <- seq(
  from = as.POSIXct('2019-03-03', tz = 'UTC'),
  to = as.POSIXct('2019-06-03', tz = 'UTC'),
  by = 'hour')

# Format observations
observations <- formatObservations(
  df = df, col.par = 'Parameter.Name',
  col.x = 'Longitude', col.y = 'Latitude',
  col.time = 'POSIX', time.series = time.series,
  col.value = 'Sample.Measurement',
  circular.pars = 'Ozone',
  col.station.name = 'StationName',
  show.progress = T)

# Check data
i.station = 50
plot(observations$Times,
     observations$Data[1, i.station, ],
     col = 'red', pch = 16)

df.sub <- subset(
  df, Latitude == observations$Ys[i.station] &
    Longitude == observations$Xs[i.station])
df.sub <- df.sub[order(df.sub$POSIX),]
lines(df.sub$POSIX, df.sub$Sample.Measurement)

# Write formatted observations to a file
writeNetCDF('Observations', observations, '~/Desktop/obs.nc')
}