--- title: "Feature Correction" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{feature_correction} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(rmarkdown.html_vignette.check_title = FALSE) ``` ```{r, include = FALSE} library(PVplr) library(knitr) library(dplyr) library(ggplot2) library(broom) library(purrr) library(tidyr) var_list <- plr_build_var_list(time_var = "timestamp", power_var = "power", irrad_var = "g_poa", temp_var = "mod_temp", wind_var = NA) test_dfc <- plr_cleaning(test_df, var_list, irrad_thresh = 100, low_power_thresh = 0.01, high_power_cutoff = NA) test_xbx_wbw_res <- plr_xbx_model(test_dfc, var_list = var_list, by = "week", data_cutoff = 30, predict_data = NULL) ``` Occasionally, PV data can include features such as saturation, failure, or strong seasonality that should be removed prior or during examination. This package includes feature correction methods addressing each of these. ## Saturation Removal ## Inverter saturation can throw off trends in the data as well. If you know the saturation limit of the inverter for your data, you can run this function to remove saturated data points. ```{r} test_dfc_removed_saturation <- plr_saturation_removal(test_dfc, var_list, sat_limit = 3000, power_thresh = 0.99) # fraction of data kept nrow(test_dfc_removed_saturation)/nrow(test_dfc) ``` ## System Failures and Soiling ### If a user knows or suspects that a period of system failure is present in the data, this package offers solutions for identifying and removing these data points. The method plr_failure_test filters data for particularly low correlation between power and irradiance, and then executes k-means clustering (with k=2) to identify a cluster of points which may indicate soiling. It has an option to group data by months or look over all data. The function removes data which is in the smaller cluster if the cluster indicates much lower power production per irradiance and accounts for a small (<.25) portion of the data. ```{r} # default values inserted for reference #df_failure <- plr_failure_test(test_dfc, var_list, corr_thresh = 0.95, plot = FALSE, by_month = FALSE) # fraction of data kept #nrow(df_failure)/nrow(test_dfc) ``` The method includes an option to plot slopes of day-by-day linear models of irradiance and power production against days. In order to plot, one must specify a file_path and file_name to save the plot under; it is not returned within the R environment. The generated boxplot is of those same slopes, visually identifying outliers which may represent failures or soiling. ## Seasonality Decomposition ## Following power prediction, seasonality may still be apparent in the data. This is often the case in the XbX model, the data-driven nature of which is prone to leaving in seasonality. Decomposition, the statistical method of removing seasonality from data, can be performed on such power predicted data. ```{r} test_xbx_wbw_decomp <- plr_decomposition(test_xbx_wbw_res, freq = 52, power_var = 'power_var', time_var = 'time_var', plot = FALSE, plot_file = NULL, title = NULL, data_file = NULL) # generate a pretty table knitr::kable(test_xbx_wbw_decomp[1:5, ], caption = "XbX Week-by-Week Decomposition: Resulting Data") ``` ```{r} # make plots of the decomposed data raw_plot <- ggplot2::ggplot(test_xbx_wbw_decomp, aes(age, raw)) + geom_point() + geom_smooth( method = "lm") + theme_bw() trend_plot <- ggplot2::ggplot(test_xbx_wbw_decomp, aes(age, trend)) + geom_point() + geom_smooth( method = "lm") + theme_bw() seasonal_plot <- ggplot2::ggplot(test_xbx_wbw_decomp, aes(age, seasonal)) + geom_point() + geom_smooth( method = "lm") + theme_bw() raw_plot trend_plot seasonal_plot ```