Skip to contents

Splits a date vector or a data frame with a date column into three time periods: warm-up, calibration, and validation. The function allows for optional adjustment of the calibration and validation periods based on the presence of missing data at the beginning of the time series.

Usage

split_data_set(
  df,
  start_end_date_vec,
  ensure_warm_up = TRUE,
  adjust_cal_end = FALSE,
  adjust_val_start = FALSE
)

Arguments

df

A vector of dates (e.g., Date, POSIXt) or a data frame containing a DatesR column and a Qmm column (used to detect the first non-NA value).

start_end_date_vec

A character vector of length six, specifying the start and end dates for the warm-up, calibration, and validation periods, in that order.

ensure_warm_up

Logical. If TRUE, adjusts the warm-up period to start at the first non-NA value in Qmm, if applicable. Default is TRUE.

adjust_cal_end

Logical. If TRUE, the end date of the calibration period is adjusted proportionally to the shift in the warm-up period, preserving the original calibration-to-validation duration ratio. This ensures that the calibration period remains representative even if the warm-up period is shifted due to missing data.

adjust_val_start

Logical. If TRUE, the start date of the validation period is adjusted to immediately follow the (potentially shifted) calibration period. This ensures continuity between calibration and validation periods when the calibration end date has been modified.

Value

A list with three elements:

ind_warm

Indices corresponding to the warm-up period.

ind_cal

Indices corresponding to the calibration period.

ind_val

Indices corresponding to the validation period.

Examples

if (FALSE) {
dates <- seq(as.Date("2000-01-01"), as.Date("2010-12-31"), by = "month")
df <- data.frame(DatesR = dates, Qmm = c(rep(NA, 12), runif(length(dates) - 12)))
periods <- split_data_set(df, c("2000-01-01", "2002-12-31", "2003-01-01", "2006-12-31", "2007-01-01", "2010-12-31"))
}