A set of tools to deal with missing values in data.frames. It can dump, replace, fill (with next or previous observation) or delete entries according to their missing values.

drop_na_dt(.data, ...)

replace_na_dt(.data, ..., to)

delete_na_cols(.data, prop = NULL, n = NULL)

delete_na_rows(.data, prop = NULL, n = NULL)

fill_na_dt(.data, ..., direction = "down")

shift_fill(x, direction = "down")

Arguments

.data

data.frame

...

Colunms to be replaced or filled. If not specified, use all columns.

to

What value should NA replace by?

prop

If proportion of NAs is larger than or equal to "prop", would be deleted.

n

If number of NAs is larger than or equal to "n", would be deleted.

direction

Direction in which to fill missing values. Currently either "down" (the default) or "up".

x

A vector with missing values to be filled.

Value

data.table

Details

drop_na_dt drops the entries with NAs in specific columns. fill_na_dt fill NAs with observations ahead ("down") or below ("up"), which is also known as last observation carried forward (LOCF) and next observation carried backward(NOCB).

delete_na_cols could drop the columns with NA proportion larger than or equal to "prop" or NA number larger than or equal to "n", delete_na_rows works alike but deals with rows.

shift_fill could fill a vector with missing values.

References

https://stackoverflow.com/questions/23597140/how-to-find-the-percentage-of-nas-in-a-data-frame

https://stackoverflow.com/questions/2643939/remove-columns-from-dataframe-where-all-values-are-na

https://stackoverflow.com/questions/7235657/fastest-way-to-replace-nas-in-a-large-data-table

See also

Examples

df <- data.table(x = c(1, 2, NA), y = c("a", NA, "b"))
 df %>% drop_na_dt()
#>        x      y
#>    <num> <char>
#> 1:     1      a
 df %>% drop_na_dt(x)
#>        x      y
#>    <num> <char>
#> 1:     1      a
#> 2:     2   <NA>
 df %>% drop_na_dt(y)
#>        x      y
#>    <num> <char>
#> 1:     1      a
#> 2:    NA      b
 df %>% drop_na_dt(x,y)
#>        x      y
#>    <num> <char>
#> 1:     1      a

 df %>% replace_na_dt(to = 0)
#>        x      y
#>    <num> <char>
#> 1:     1      a
#> 2:     2      0
#> 3:     0      b
 df %>% replace_na_dt(x,to = 0)
#>        x      y
#>    <num> <char>
#> 1:     1      a
#> 2:     2   <NA>
#> 3:     0      b
 df %>% replace_na_dt(y,to = 0)
#>        x      y
#>    <num> <char>
#> 1:     1      a
#> 2:     2      0
#> 3:    NA      b
 df %>% replace_na_dt(x,y,to = 0)
#>        x      y
#>    <num> <char>
#> 1:     1      a
#> 2:     2      0
#> 3:     0      b

 df %>% fill_na_dt(x)
#>        x      y
#>    <num> <char>
#> 1:     1      a
#> 2:     2   <NA>
#> 3:     2      b
 df %>% fill_na_dt() # not specified, fill all columns
#>        x      y
#>    <num> <char>
#> 1:     1      a
#> 2:     2      a
#> 3:     2      b
 df %>% fill_na_dt(y,direction = "up")
#>        x      y
#>    <num> <char>
#> 1:     1      a
#> 2:     2      b
#> 3:    NA      b

x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x
#>    x  y  z
#> 1  1 NA NA
#> 2  2 NA NA
#> 3 NA  4 NA
#> 4  3  5 NA
x %>% delete_na_cols()
#>        x     y
#>    <num> <num>
#> 1:     1    NA
#> 2:     2    NA
#> 3:    NA     4
#> 4:     3     5
x %>% delete_na_cols(prop = 0.75)
#>        x     y
#>    <num> <num>
#> 1:     1    NA
#> 2:     2    NA
#> 3:    NA     4
#> 4:     3     5
x %>% delete_na_cols(prop = 0.5)
#>        x
#>    <num>
#> 1:     1
#> 2:     2
#> 3:    NA
#> 4:     3
x %>% delete_na_cols(prop = 0.24)
#> Null data.table (0 rows and 0 cols)
x %>% delete_na_cols(n = 2)
#>        x
#>    <num>
#> 1:     1
#> 2:     2
#> 3:    NA
#> 4:     3

x %>% delete_na_rows(prop = 0.6)
#>        x     y      z
#>    <num> <num> <lgcl>
#> 1:     3     5     NA
x %>% delete_na_rows(n = 2)
#>        x     y      z
#>    <num> <num> <lgcl>
#> 1:     3     5     NA

# shift_fill
y = c("a",NA,"b",NA,"c")

shift_fill(y) # equals to
#> [1] "a" "a" "b" "b" "c"
shift_fill(y,"down")
#> [1] "a" "a" "b" "b" "c"

shift_fill(y,"up")
#> [1] "a" "b" "b" "c" "c"