The mutating joins add columns from `y` to `x`, matching rows based on the keys:
* `inner_join()`: includes all rows in `x` and `y`. * `left_join()`: includes all rows in `x`. * `right_join()`: includes all rows in `y`. * `full_join()`: includes all rows in `x` or `y`.
Filtering joins filter rows from `x` based on the presence or absence of matches in `y`:
* `semi_join()` return all rows from `x` with a match in `y`. * `anti_join()` return all rows from `x` without a match in `y`.
inner_join(x, y, by = NULL, on = NULL) left_join(x, y, by = NULL, on = NULL) right_join(x, y, by = NULL, on = NULL) full_join(x, y, by = NULL, on = NULL) anti_join(x, y, by = NULL, on = NULL) semi_join(x, y, by = NULL, on = NULL)
x | A data.table |
---|---|
y | A data.table |
by | (Optional) A character vector of variables to join by. If `NULL`, the default, `*_join()` will perform a natural join, using all variables in common across `x` and `y`. A message lists the variables so that you can check they're correct; suppress the message by supplying `by` explicitly. To join by different variables on `x` and `y`, use a named vector. For example, `by = c("a" = "b")` will match `x$a` to `y$b`. To join by multiple variables, use a vector with length > 1. For example, `by = c("a", "b")` will match `x$a` to `y$a` and `x$b` to `y$b`. Use a named vector to match different variables in `x` and `y`. For example, `by = c("a" = "b", "c" = "d")` will match `x$a` to `y$b` and `x$c` to `y$d`. |
on | (Optional)
Indicate which columns in x should be joined with which columns in y.
Examples included:
1. |
A data.table
workers = fread(" name company Nick Acme John Ajax Daniela Ajax ") positions = fread(" name position John designer Daniela engineer Cathie manager ") workers %>% inner_join(positions)#>#> Key: <name> #> name company position #> <char> <char> <char> #> 1: Daniela Ajax engineer #> 2: John Ajax designerworkers %>% left_join(positions)#>#> Key: <name> #> name company position #> <char> <char> <char> #> 1: Daniela Ajax engineer #> 2: John Ajax designer #> 3: Nick Acme <NA>workers %>% right_join(positions)#>#> Key: <name> #> name company position #> <char> <char> <char> #> 1: Cathie <NA> manager #> 2: Daniela Ajax engineer #> 3: John Ajax designerworkers %>% full_join(positions)#>#> Key: <name> #> name company position #> <char> <char> <char> #> 1: Cathie <NA> manager #> 2: Daniela Ajax engineer #> 3: John Ajax designer #> 4: Nick Acme <NA># filtering joins workers %>% anti_join(positions)#>#> name company #> <char> <char> #> 1: Nick Acmeworkers %>% semi_join(positions)#>#> name company #> <char> <char> #> 1: John Ajax #> 2: Daniela Ajax# To suppress the message, supply 'by' argument workers %>% left_join(positions, by = "name")#> Key: <name> #> name company position #> <char> <char> <char> #> 1: Daniela Ajax engineer #> 2: John Ajax designer #> 3: Nick Acme <NA># Use a named 'by' if the join variables have different names positions2 = setNames(positions, c("worker", "position")) # rename first column in 'positions' workers %>% inner_join(positions2, by = c("name" = "worker"))#> Key: <name> #> name company position #> <char> <char> <char> #> 1: Daniela Ajax engineer #> 2: John Ajax designer# the syntax of 'on' could be a bit different workers %>% inner_join(positions2,on = "name==worker")#> name company position #> <char> <char> <char> #> 1: John Ajax designer #> 2: Daniela Ajax engineer