A set of tools to deal with missing values in data.frames. It can dump, replace, fill (with next or previous observation) or delete entries according to their missing values.
drop_na_dt(.data, ...)
replace_na_dt(.data, ..., to)
delete_na_cols(.data, prop = NULL, n = NULL)
delete_na_rows(.data, prop = NULL, n = NULL)
fill_na_dt(.data, ..., direction = "down")
shift_fill(x, direction = "down")data.frame
Colunms to be replaced or filled. If not specified, use all columns.
What value should NA replace by?
If proportion of NAs is larger than or equal to "prop", would be deleted.
If number of NAs is larger than or equal to "n", would be deleted.
Direction in which to fill missing values. Currently either "down" (the default) or "up".
A vector with missing values to be filled.
data.table
drop_na_dt drops the entries with NAs in specific columns.
fill_na_dt fill NAs with observations ahead ("down") or below ("up"),
which is also known as last observation carried forward (LOCF) and
next observation carried backward(NOCB).
delete_na_cols could drop the columns with NA proportion larger
than or equal to "prop" or NA number larger than or equal to "n",
delete_na_rows works alike but deals with rows.
shift_fill could fill a vector with missing values.
https://stackoverflow.com/questions/23597140/how-to-find-the-percentage-of-nas-in-a-data-frame
https://stackoverflow.com/questions/2643939/remove-columns-from-dataframe-where-all-values-are-na
https://stackoverflow.com/questions/7235657/fastest-way-to-replace-nas-in-a-large-data-table
df <- data.table(x = c(1, 2, NA), y = c("a", NA, "b"))
df %>% drop_na_dt()
#> x y
#> <num> <char>
#> 1: 1 a
df %>% drop_na_dt(x)
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 <NA>
df %>% drop_na_dt(y)
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: NA b
df %>% drop_na_dt(x,y)
#> x y
#> <num> <char>
#> 1: 1 a
df %>% replace_na_dt(to = 0)
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 0
#> 3: 0 b
df %>% replace_na_dt(x,to = 0)
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 <NA>
#> 3: 0 b
df %>% replace_na_dt(y,to = 0)
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 0
#> 3: NA b
df %>% replace_na_dt(x,y,to = 0)
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 0
#> 3: 0 b
df %>% fill_na_dt(x)
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 <NA>
#> 3: 2 b
df %>% fill_na_dt() # not specified, fill all columns
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 a
#> 3: 2 b
df %>% fill_na_dt(y,direction = "up")
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 b
#> 3: NA b
x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x
#> x y z
#> 1 1 NA NA
#> 2 2 NA NA
#> 3 NA 4 NA
#> 4 3 5 NA
x %>% delete_na_cols()
#> x y
#> <num> <num>
#> 1: 1 NA
#> 2: 2 NA
#> 3: NA 4
#> 4: 3 5
x %>% delete_na_cols(prop = 0.75)
#> x y
#> <num> <num>
#> 1: 1 NA
#> 2: 2 NA
#> 3: NA 4
#> 4: 3 5
x %>% delete_na_cols(prop = 0.5)
#> x
#> <num>
#> 1: 1
#> 2: 2
#> 3: NA
#> 4: 3
x %>% delete_na_cols(prop = 0.24)
#> Null data.table (0 rows and 0 cols)
x %>% delete_na_cols(n = 2)
#> x
#> <num>
#> 1: 1
#> 2: 2
#> 3: NA
#> 4: 3
x %>% delete_na_rows(prop = 0.6)
#> x y z
#> <num> <num> <lgcl>
#> 1: 3 5 NA
x %>% delete_na_rows(n = 2)
#> x y z
#> <num> <num> <lgcl>
#> 1: 3 5 NA
# shift_fill
y = c("a",NA,"b",NA,"c")
shift_fill(y) # equals to
#> [1] "a" "a" "b" "b" "c"
shift_fill(y,"down")
#> [1] "a" "a" "b" "b" "c"
shift_fill(y,"up")
#> [1] "a" "b" "b" "c" "c"