Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

0
votes
2answers
11 views

Fill down every other row with level above in tidyverse

I'm learning R and am hitting a snag with tidry and dplyr. I've got a data frame in R where the first column is a factor that only has a level every other row. I'm trying to figure out how to use ...
0
votes
4answers
29 views

Sorting rows based on variable group in a data frame

I've got a dataframe that looks like this - df=data.frame(Code=c('Q1','Q1','Q1','Q1','Q2','Q2','Q2','Q2','Q3','Q3','Q3','Q3'), Fiscal_Year=c('FY18','FY16','FY17','FY15','FY15','FY18','FY17',...
0
votes
2answers
21 views

Lag variable by group/time indicator in dplyr

I have data that looks like this: set.seed(13) dt <- data.frame(group = c(rep("a", 3), rep("b", 4), rep("c", 3)), var = c(rep(0.1,3), rep(0.3, 4), rep(1.1,3))) dt group var 1 a 0.1 2 ...
1
vote
2answers
18 views

Passing column names to user defined function inside mutate_at

My first question in this forum: so kindly excuse me if I have missed some protocols: I am struggling for some time now to pass column names inside my custom function while using dplyr - mutate_at. I ...
-3
votes
1answer
28 views

Filtering values

I have one data.frame with three columns, Name_of_brand,Price and Quantitity. I would like to continue only with observation which is equal or greаther than 100 in column Price_Category. ...
1
vote
1answer
28 views

R Count the number of cities visited

I have a data table that contains two variables, the first column contains a number of cities a person visited and the other one is the rating of the trip, example as code below: trips <- data....
0
votes
1answer
26 views

Row sum using mutate and select [duplicate]

library(dplyr) I have the following data: d1 <- data_frame( name = c("jim", "john", "jim", "john"), `2012` = c(57, 58, 47, 57), `2013` = c(14, 3, 3, 90)) I would like to create two new rows ...
1
vote
0answers
19 views

return output as columns instead of list after applying a function using dplyr [duplicate]

library(dplyr) Sample data df <- data.frame(year = rep(1981:1982, each = 365), doy = rep(1:365, times = 2), tmean = sample(20:35, 730, replace = T), ref.doy = rep(c(60, 80), ...
0
votes
0answers
9 views

“Evaluation error: object not found” in dplyr when calling an object not created yet

I have the following pipes that fails because the penultimate line (the mutate that creates a Code column) is calling and object that hasn't been generated yet. I am using dplyr and working with ...
2
votes
1answer
39 views

How to obtain species richness and abundance for sites with multiple samples using dplyr [duplicate]

Edit: This was marked as a duplicate of this question: Manipulating seperated species quantity data into a species abundance matrix It is a completely different question, however - that question is ...
0
votes
1answer
28 views

Filter value in Multiple Columns in R

I am trying to filter a data frame based on a vector value (that comes from a loop) in multiple columns at the same time. As this takes place in a loop, here is the pertinent steps: name.id = ...
1
vote
4answers
41 views

select two random and consecutive rows from grouped data

In the data below (included with dput), I have repeat observations (lat and long) for three individuals (IndIDII). Note, there are a different number of locations for each individual and that they are ...
2
votes
2answers
20 views

Using ifelse within mutate and handling NA's

thanks for your time. I have a question about using ifelse within the mutate function. ifelse is from base R, while mutate is from the dplyr package. My question is about how ifelse handles NA ...
2
votes
2answers
21 views

Problem using dplyr on tibbles with vector elements

I am running into some problems doing text processing using dplyr and stringr functions (specifically str_split()). I think I am misunderstanding something very fundamental about how to use dplyr ...
1
vote
1answer
22 views

Identify interrupted observations

I would like to identify missing observations that suggest cleaning/data errors. My dataframe consists of many accounts over many years. Here are the rules it follows: Accounts may be created or ...
0
votes
0answers
31 views

How to calculate an 8 hour mean? (non-running average)

I would like to calculate an average of 8 hour averages. My approach: cut every 8 hours and then take the mean of each 8 hour segment. The second step is to get the mean of these 8 hour averages. ...
-1
votes
1answer
24 views

Fixing the order of facets in ggplot when using dplyr to transform data to long form

I got help here yesterday to create a facet grid of multiple columns. This yielded a large grid containing 8*5 plots. The code creates a combination of plot for various Outcomes * Responses. For ...
-2
votes
2answers
43 views
1
vote
1answer
37 views

Unable to subset within mutate() following a summarize() with a tibble

I don't know if this is behavior unique to handling tibbles, and that I need to subset it a different way. library(dplyr) library(gapminder) df <- gapminder %>% group_by(year, continent) %>...
1
vote
1answer
21 views

Dplyr keeps automatically adding one of my columns [duplicate]

So, I have a large data.frame with multiple columns of which "trial.number" and "indexer" are 2. It annoys me that dplyr constantly, no matter what, adds indexer column. A simple example: saccade....
0
votes
1answer
24 views

separate row containing two separate dates into before and after midnight

I have a data frame containing sleep data, with several sleep increments, with a column for the start and a column for the end of the sleep. For some rows, the starting time is on the previous day and ...
0
votes
1answer
24 views

Change column class if column consists of numbers

I have a data frame where I had to convert all variables to the character class in order to bind_rows(). Now I want to identify and convert the columns that have numbers in them back to class numeric. ...
1
vote
3answers
57 views

Tidyverse: filtering n largest groups in grouped dataframe

I want to filter the n largest groups based on count, and then do some calculations on the filtered dataframe Here is some data Brand <- c("A","B","C","A","A","B","A","A","B","C") Category <- ...
0
votes
2answers
37 views

How to fill a column from a data frame based on another data frame using dplyr

I have two data frames and I am trying to replace NAs in a column of the second data frame using the values in a column of the first data frame. I would like to do this using the dplyr package and I ...
1
vote
1answer
30 views

extract contents from confusionMatrix saved in a list column in dplyr

As shown in code below, after cross validation, I'm trying to extract model metrics for each fold. I saved all predictions in resampling, group the data by folds, compute the confusion matrix for each ...
2
votes
2answers
30 views

Delete specific columns with NA values

This is my dataframe: set.seed(1) df <- data.frame(A = 1:50, B = 11:60, c = 21:70) head(df) df.final <- as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.85, 0.15), size =...
1
vote
2answers
44 views

Create summary value when using group_by and summarize

I often want to show change given a baseline year. For example, how much has something changed since a given year as a percentage? The gapminder dataset provides an excellent example: To start to get ...
0
votes
2answers
55 views

Is there any way to not to loop this

I have the following code that uses monthly data: set.seed(2) vector <- as.data.frame(runif(120)+0.5) b <- data.frame() for (j in 1:I(nrow(vector))) { if (is.na(vector[j, i]) || is....
2
votes
2answers
29 views

Winners within pairs; or vector-valued group_by mutate?

I'm trying to assess which unit in a pair is the "winner". group_by() %>% mutate() is close to the right thing, but it's not quite there. in particular dat %>% group_by(pair) %>% mutate(...
-1
votes
0answers
20 views

How to filter duplicate rows but the row may include different values in R? [on hold]

df <- data.frame(id = c(1,1,2,3), episode = c(1,1,4,1), level = c("A", "A", "A", "B"), Score = c(60,90,60,10)) I want to filter the first two rows, their id, episode, and level same but their ...
2
votes
3answers
45 views

Retrieve all possible combinations of n items, of a given size k and apply function sum on another column

I have a df looking like: item value 1 a 1 2 b 4 3 c 3 4 d 2 5 e 6 6 f 8 7 g 11 df <- data.frame(stringsAsFactors=FALSE, item = c("a", "b"...
0
votes
1answer
29 views

calculate difference between 2 dates and print the between dates

st_day<-c(1,5,10) endday<-c(4,9,15) d<-c(1,2,3) data<-cbind(st_day,endday,d) days1<-c(1:15) dose1<-rep(c(1,2,3),each=5) result <- cbind(days1,dose1) Hello I have 2 coulmns with ...
0
votes
0answers
10 views

Status bar in split-map approach of model training

Is it possible to get status bar of training process when train models like this: mtcars %>% split(.$cyl) %>% map(~ lm(mpg ~ wt, data = .x)) dplyr group_by() + do() approach outputs ...
1
vote
0answers
22 views

calling information from different data frames whose results match a character vector

I am trying to apply a function to convert financial accounts from a number of companies to USD. The firms relating to each currency can be found below. USDfirms <- c("GOOG", "AMZN", "AAPL", "...
0
votes
0answers
12 views

select rownames with topn values for each column in a dataframe

I have a dataframe consisting of geneloadings obtained from a PCA, I want to select the first topn (200) (ordered by absolute values) genes for the first 7 PCs, and then get the union of these vectors....
0
votes
1answer
47 views

Splitting overlapping rows, within groups, based on dates

I'm trying to create new rows based on overlapping time periods of existing rows. For example, I'd like to turn this: Customer_Product <- data.table(Customer=c("A01","A01","A01", "A02", "A02", "...
-1
votes
1answer
33 views

Using regex to set a specific digit to NA? [duplicate]

Sample of df: LASSO_deviance LASSO_AUC 68 0.999 0.999 2 1.000 1.000 39 1.000 1.005 7 1.02 1.2 I want to set cells which contain 1.000 to ...
0
votes
4answers
60 views

How to bind additional rows to dataframe for column totals? [duplicate]

I'm trying to add additional rows to my data table with the column totals so that when I display on ggplot, I am able to filter by "Total" for my selectInput in my Shiny app. However, because I have ...
2
votes
1answer
54 views

Count where multiple columns match a single column in a presence absence matrix

I have a presence absence matrix of plant species and it looks something like this... set.seed(123) Data <- data.frame( endemic = sample(0:1, 10, replace = TRUE), val1p1 = sample(0:1, 10, ...
0
votes
0answers
26 views

Select objects of interest and pass them to map_dfr to bind rows in R? [duplicate]

I have 13000 dataframes, each look like: col1 col2 col3 col4 col5 1 T 4.4 3 1 3 F 23.3 4 1 353 T 1.3 2 1 34 T 0.4 1 1 Each df name has the following format/...
0
votes
0answers
28 views

Filter and select in dplyr: remove extra step [duplicate]

When I filter a dataframe using dplyr, there is an extra step that I need to perform (see code below). can I do something to not have this last row? df <- df_1 %>% filter(col_1 %in% mylist) %&...
7
votes
4answers
192 views

String as formula

I've tried searching through the forums and unable to find help. I'm quite new to R, and am having limited success in loading in some strings to be used as a formula. I have a csv with the following ...
-1
votes
3answers
39 views

Match Group Data with user data and get groups

I have two dataset, one which is various products like this User Product A . 1 A . 2 A . 3 B . 1 B . 3 B . 4 And another table Group Product X1 . 1 X1 . 2 X1 . 4 X2 . 1 X2 . 3 ...
0
votes
1answer
29 views

Convert repeated labels and values in rows to columns in R using tidyr [duplicate]

Here is an example data for reproducibility: pop year type value pop3 1980 prev 1.42 pop4 1988 prev 1.53 pop6 1981 prev 1.42 pop8 1980 prev 1.7 pop3 ...
0
votes
2answers
41 views

Filter to all rows where there are duplicate values in two columns (dplyr) [duplicate]

I have a data frame that looks like this: id dob lname 1 1900-01-01 a 2 1900-01-01 b 3 1900-01-01 b 4 1901-01-01 c 5 1901-01-01 d 6 1902-01-01 e 7 1902-01-01 e 8 ...
-2
votes
0answers
14 views

could not find function “tbl_dt”; from R package dplyr [duplicate]

My problem is I cant get R to recognise the function "tbl_dt" which is supposed to come with the package "dplyr". Does anyone know what I could be doing wrong in what I outline below? This was asked ...
0
votes
3answers
50 views

Column referencing after dplyr doesn't work [duplicate]

Here is a reproducible example: I will start by assigning the mtcars dataset to a variable called temp. temp = mtcars If we try to reference a column in this df, it works as expected. Results in ...
0
votes
1answer
52 views

Error in bind_rows_(x, .id) : Argument 1 must have names

Here is a code snippet: y <- purrr::map(1:2, ~ c(a=.x)) test1 <- dplyr::bind_rows(y) test2 <- do.call(dplyr::bind_rows, y) The first call to bind_rows (test1) generates the error Error in ...
0
votes
3answers
36 views

How do I create a new categorical variable from continuous multiple observations?

This is my data: ID dist 1 23 1 10 2 12 2 20 3 14 3 33 I want to go through each ID, and create a new column ("state") for the larger value for each ID call it "high" and for the lower ...
1
vote
1answer
35 views

Create a new column which is the max of datetime with conditions on other columns

I have a dataframe like this. ID <- c("111","111","111","111", "113","113","113","113") ToolID <- c("CCP_A","CCP_B","CCP_B","CCQ_A", "CCP_A","CCP_B","CCP_B","CCQ_A") Step &...