A set of tools to deal with missing values in data.frames. It can dump, replace, fill (with next or previous observation) or delete entries according to their missing values.
drop_na_dt(.data, ...)
replace_na_dt(.data, ..., to)
delete_na_cols(.data, prop = NULL, n = NULL)
delete_na_rows(.data, prop = NULL, n = NULL)
fill_na_dt(.data, ..., direction = "down")
shift_fill(x, direction = "down")
data.frame
Colunms to be replaced or filled. If not specified, use all columns.
What value should NA replace by?
If proportion of NAs is larger than or equal to "prop", would be deleted.
If number of NAs is larger than or equal to "n", would be deleted.
Direction in which to fill missing values. Currently either "down" (the default) or "up".
A vector with missing values to be filled.
data.table
drop_na_dt
drops the entries with NAs in specific columns.
fill_na_dt
fill NAs with observations ahead ("down") or below ("up"),
which is also known as last observation carried forward (LOCF) and
next observation carried backward(NOCB).
delete_na_cols
could drop the columns with NA proportion larger
than or equal to "prop" or NA number larger than or equal to "n",
delete_na_rows
works alike but deals with rows.
shift_fill
could fill a vector with missing values.
https://stackoverflow.com/questions/23597140/how-to-find-the-percentage-of-nas-in-a-data-frame
https://stackoverflow.com/questions/2643939/remove-columns-from-dataframe-where-all-values-are-na
https://stackoverflow.com/questions/7235657/fastest-way-to-replace-nas-in-a-large-data-table
df <- data.table(x = c(1, 2, NA), y = c("a", NA, "b"))
df %>% drop_na_dt()
#> x y
#> <num> <char>
#> 1: 1 a
df %>% drop_na_dt(x)
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 <NA>
df %>% drop_na_dt(y)
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: NA b
df %>% drop_na_dt(x,y)
#> x y
#> <num> <char>
#> 1: 1 a
df %>% replace_na_dt(to = 0)
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 0
#> 3: 0 b
df %>% replace_na_dt(x,to = 0)
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 <NA>
#> 3: 0 b
df %>% replace_na_dt(y,to = 0)
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 0
#> 3: NA b
df %>% replace_na_dt(x,y,to = 0)
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 0
#> 3: 0 b
df %>% fill_na_dt(x)
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 <NA>
#> 3: 2 b
df %>% fill_na_dt() # not specified, fill all columns
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 a
#> 3: 2 b
df %>% fill_na_dt(y,direction = "up")
#> x y
#> <num> <char>
#> 1: 1 a
#> 2: 2 b
#> 3: NA b
x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x
#> x y z
#> 1 1 NA NA
#> 2 2 NA NA
#> 3 NA 4 NA
#> 4 3 5 NA
x %>% delete_na_cols()
#> x y
#> <num> <num>
#> 1: 1 NA
#> 2: 2 NA
#> 3: NA 4
#> 4: 3 5
x %>% delete_na_cols(prop = 0.75)
#> x y
#> <num> <num>
#> 1: 1 NA
#> 2: 2 NA
#> 3: NA 4
#> 4: 3 5
x %>% delete_na_cols(prop = 0.5)
#> x
#> <num>
#> 1: 1
#> 2: 2
#> 3: NA
#> 4: 3
x %>% delete_na_cols(prop = 0.24)
#> Null data.table (0 rows and 0 cols)
x %>% delete_na_cols(n = 2)
#> x
#> <num>
#> 1: 1
#> 2: 2
#> 3: NA
#> 4: 3
x %>% delete_na_rows(prop = 0.6)
#> x y z
#> <num> <num> <lgcl>
#> 1: 3 5 NA
x %>% delete_na_rows(n = 2)
#> x y z
#> <num> <num> <lgcl>
#> 1: 3 5 NA
# shift_fill
y = c("a",NA,"b",NA,"c")
shift_fill(y) # equals to
#> [1] "a" "a" "b" "b" "c"
shift_fill(y,"down")
#> [1] "a" "a" "b" "b" "c"
shift_fill(y,"up")
#> [1] "a" "b" "b" "c" "c"