RAnEn::formatObservations — formatObservations • RAnEn

RAnEn::formatObservations generates the observation list required by analog computation.

formatObservations(
  df,
  col.par,
  col.x,
  col.y,
  col.time,
  time.series,
  col.value,
  verbose = T,
  preview = 2,
  remove.duplicates = T,
  circular.pars = NULL,
  col.station.name = NULL,
  show.progress = F,
  sort.stations = NULL
)

Arguments

df: A data frame to be converted to an R list.
col.par: The column name for parameter names.
col.x: The column name for station x coordinates.
col.y: The column name for station y coordinates.
col.time: The column name for times. The column should be POSIXct.
time.series: The times to be extract into observations. This should be a POSIXct vector.
col.value: The column name for data values.
verbose: Whether to print progress messages.
preview: How many entries to preview in progress messages.
remove.duplicates: Whether to remove redundant values associated with the same time. Sometimes, it is possible that, due to equipment mulfunctions, there are multiple measurements at the same time. Idealy this should cleaned prior to this function, but this function is able to keep the first appearance and remove the rest.
circular.pars: A character vector for the circular parameter names.
col.station.name: The column name for station names.
show.progress: Whether to show a progress bar.
sort.stations: Sort station. It can be Xs, Ys, or StationNames if it is set.

Value

An R list for observation data.

Details

The observation list is an R list with members including ParameterNames, Xs, Ys, Data, and etc., with a full list accessible here. RAnEn::formatObservations make it easier to convert a data frame into a such list data structure.

I read this tutorial when developing functions with dplyr. It is very informative of using variables with dplyr functions.

Examples

if (FALSE) {
# How to download this file? Please see the tutorial
# https://github.com/Weiming-Hu/AnalogsEnsemble/blob/master/RAnalogs/examples/demo-5_observation-conversion.Rmd
#
obs <- read.table('~/Desktop/hourly_44201_2019.csv',
                  sep = ',', quote = '"', header = T, stringsAsFactors = F)

# Sample data
df <- obs[sample(nrow(obs), floor(nrow(obs) * 0.01)), ]

# Create a POSIXct time
df$POSIX <- as.POSIXct(
  paste(df$Date.GMT, df$Time.GMT),
  format = '%Y-%m-%d %H:%M', tz = 'UTC')

# Create unique station names
df$StationName <- paste(
  df$State.Name, df$County.Name, df$Site.Num, sep = '-')

# Create a target time series
time.series <- seq(
  from = as.POSIXct('2019-03-03', tz = 'UTC'),
  to = as.POSIXct('2019-06-03', tz = 'UTC'),
  by = 'hour')

# Format observations
observations <- formatObservations(
  df = df, col.par = 'Parameter.Name',
  col.x = 'Longitude', col.y = 'Latitude',
  col.time = 'POSIX', time.series = time.series,
  col.value = 'Sample.Measurement',
  circular.pars = 'Ozone',
  col.station.name = 'StationName',
  show.progress = T)

# Check data
i.station = 50
plot(observations$Times,
     observations$Data[1, i.station, ],
     col = 'red', pch = 16)

df.sub <- subset(
  df, Latitude == observations$Ys[i.station] &
    Longitude == observations$Xs[i.station])
df.sub <- df.sub[order(df.sub$POSIX),]
lines(df.sub$POSIX, df.sub$Sample.Measurement)

# Write formatted observations to a file
writeNetCDF('Observations', observations, '~/Desktop/obs.nc')
}