R

Logistic Regression: Part II - Varietal adoption dataset

Binary classifier using categorical predictor Let’s say we have two variable – survey response of farmer to willingness to adopt improved rice variety (in YES/NO) and them having been trained earlier about agricultural input management (in trained/untrained). Read in the data and notice the summary.

Paste together multiple columns

# # paste together dataframe columns by column index # take the following df df <- data.frame(my_number = letters[1:5], column_odd1 = rnorm(5), column_even1 = rnorm(5), column_odd2 = rnorm(5), column_even2 = rnorm(5), column_odd3 = rnorm(5), column_even3 = rnorm(5)) df %>% select(1) %>% bind_cols(data.

The birthday problem: Non analytical solution

# Birthday problem crossing(n = 2:100, x = 2:4) %>% mutate(probability = map2_dbl(n, x, ~pbirthday(.x, coincident = .y))) %>% ggplot(aes(n, probability, color = factor(x))) + geom_line() + labs(x = "People in room", y = "Probability X people share a birthday", color = "X") # Approximating birthday paradox with Poisson distribution crossing(n = 2:250, x = 2:4) %>% mutate(combinations = choose(n, x), probability_each = (1/365)^(x-1), poisson = 1-dpois(0, combinations * probability_each), pbirthday_x = map2_dbl(n, x, ~pbirthday(.

Making Summary Tables in R

Background General purpose tables Summary tables rtables package qwraps2 package gtsummary package Background Table output of R is one of the richest and satisfying to use feature. Rmarkdown format provides loads of package support to create, format, and present tables beautifully.

Tidytuesday: Claremont Run, X-men Characters

X men characters Data dictionary explore Table 1: Data summary Name Piped data Number of rows 308 Number of columns 9 _______________________ Column type frequency: character 8 numeric 1 ________________________ Group variables Variable type: character

Time Series: Basic Analysis

Background This post is the first in a series of upcoming blog that tries to describe application of a lesser used technique in econometrics – time series analysis. I make extensive use of datasets available in several R packages – mostly the tsibbledata package.

Color charts: An introductory review on applications to qualitative crop phenotyping

Background Colorimetry is a fascinating topic to discuss. In conjunction with the patterns of a natural world (See this awesome video about fibonacci numbers and plants), colors could have mesmerizing feels. In this post and the follow-up article, we will discuss in details about colorimetric features of a universe made of plants, in particular, which are cultivated/adopted and have edible human values – the agricultural crops.

String tip: complex pattern recognition

Background This post is all about examples and use cases. So…Let’s break a leg. Extract all words except last one using anchors and look arounds nasty_char <- c("I love playing wildly") # remove the last word 'wildly' stringr::str_extract(nasty_char, ".

String tip: vectorized pattern replacement

Example case Suppose you have a bunch of really filthy names, which makes you puke… You can go about fixing those with the help of stringi and stringr Lets say following character vector hosts those filthy names. filthy <- c("Grains %", "Moisture (gm/kg)", "Plant height (cm)", "White spaces", "White space (filth%)") filthy ## [1] "Grains %" "Moisture (gm/kg)" "Plant height (cm)" ## [4] "White spaces" "White space (filth%)" Now to get rid of the filth use string manipulation.

Variance component based parameter estimation of incomplete block designs

Introduction Variance component models are also suited for analysis of incomplete block designs, besides complete block designs. This post aims to demonstrate exactly that. Using a dataset generated from alpha lattice design, I show how the design can be properly modeled and fit using OLS regression having various fixed model components.