The mutating joins add columns from `y` to `x`, matching rows based on the keys:
* `inner_join_dt()`: includes all rows in `x` and `y`. * `left_join_dt()`: includes all rows in `x`. * `right_join_dt()`: includes all rows in `y`. * `full_join_dt()`: includes all rows in `x` or `y`.
Filtering joins filter rows from `x` based on the presence or absence of matches in `y`:
* `semi_join_dt()` return all rows from `x` with a match in `y`. * `anti_join_dt()` return all rows from `x` without a match in `y`.
inner_join_dt(x, y, by = NULL, on = NULL, suffix = c(".x", ".y"))
left_join_dt(x, y, by = NULL, on = NULL, suffix = c(".x", ".y"))
right_join_dt(x, y, by = NULL, on = NULL, suffix = c(".x", ".y"))
full_join_dt(x, y, by = NULL, on = NULL, suffix = c(".x", ".y"))
anti_join_dt(x, y, by = NULL, on = NULL)
semi_join_dt(x, y, by = NULL, on = NULL)
A data.table
A data.table
(Optional) A character vector of variables to join by.
If `NULL`, the default, `*_join_dt()` will perform a natural join, using all variables in common across `x` and `y`. A message lists the variables so that you can check they're correct; suppress the message by supplying `by` explicitly.
To join by different variables on `x` and `y`, use a named vector. For example, `by = c("a" = "b")` will match `x$a` to `y$b`.
To join by multiple variables, use a vector with length > 1. For example, `by = c("a", "b")` will match `x$a` to `y$a` and `x$b` to `y$b`. Use a named vector to match different variables in `x` and `y`. For example, `by = c("a" = "b", "c" = "d")` will match `x$a` to `y$b` and `x$c` to `y$d`.
(Optional)
Indicate which columns in x should be joined with which columns in y.
Examples included:
1..by = c("a","b")
(this is a must for set_full_join_dt
);
2..by = c(x1="y1", x2="y2")
;
3..by = c("x1==y1", "x2==y2")
;
4..by = c("a", V2="b")
;
5..by = .(a, b)
;
6..by = c("x>=a", "y<=b")
or .by = .(x>=a, y<=b)
.
If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.
A data.table
workers = fread("
name company
Nick Acme
John Ajax
Daniela Ajax
")
positions = fread("
name position
John designer
Daniela engineer
Cathie manager
")
workers %>% inner_join_dt(positions)
#> Joining by: name
#> Key: <name>
#> name company position
#> <char> <char> <char>
#> 1: Daniela Ajax engineer
#> 2: John Ajax designer
workers %>% left_join_dt(positions)
#> Joining by: name
#> Key: <name>
#> name company position
#> <char> <char> <char>
#> 1: Daniela Ajax engineer
#> 2: John Ajax designer
#> 3: Nick Acme <NA>
workers %>% right_join_dt(positions)
#> Joining by: name
#> Key: <name>
#> name company position
#> <char> <char> <char>
#> 1: Cathie <NA> manager
#> 2: Daniela Ajax engineer
#> 3: John Ajax designer
workers %>% full_join_dt(positions)
#> Joining by: name
#> Key: <name>
#> name company position
#> <char> <char> <char>
#> 1: Cathie <NA> manager
#> 2: Daniela Ajax engineer
#> 3: John Ajax designer
#> 4: Nick Acme <NA>
# filtering joins
workers %>% anti_join_dt(positions)
#> Joining by: name
#> name company
#> <char> <char>
#> 1: Nick Acme
workers %>% semi_join_dt(positions)
#> Joining by: name
#> name company
#> <char> <char>
#> 1: John Ajax
#> 2: Daniela Ajax
# To suppress the message, supply 'by' argument
workers %>% left_join_dt(positions, by = "name")
#> Key: <name>
#> name company position
#> <char> <char> <char>
#> 1: Daniela Ajax engineer
#> 2: John Ajax designer
#> 3: Nick Acme <NA>
# Use a named 'by' if the join variables have different names
positions2 = setNames(positions, c("worker", "position")) # rename first column in 'positions'
workers %>% inner_join_dt(positions2, by = c("name" = "worker"))
#> Key: <name>
#> name company position
#> <char> <char> <char>
#> 1: Daniela Ajax engineer
#> 2: John Ajax designer
# the syntax of 'on' could be a bit different
workers %>% inner_join_dt(positions2,on = "name==worker")
#> name company position
#> <char> <char> <char>
#> 1: John Ajax designer
#> 2: Daniela Ajax engineer