Reflecting on the Mini Projects

A recap of all mini projects from the semester.

In the first mini project, we simulated the sampling distributions of sample minimums and maximums taken from different populations. This unit was essentially our introduction to test statistics in general, which became an important concept as the semester progressed. We used them in confidence intervals and we derived estimators for these statistics for distributions (ex. MLE for a mean). A lot of the semester involved different methods of trying to come up with the true parameters for distributions, which often used methods involving simulating samples. In mini project 3, we simulated samples from a population in order to find confidence intervals for p, which was essentially the mean, a test statistic, of a binomial sample. Overall, I feel like this project helped prepare me for the types of sampling simulations that we went on to do over the rest of the semester and highlighted the importance and usefulness of simulation in statistics. Also, based on the conclusions of this project relating to symmetry and SE, it showed how there are often patterns in statistics.

In the second mini project, we wrote a “meaningful story” using vocabulary from our unit on estimation. During this unit we worked on different methods of estimating parameters, and what may make certain estimators “better” than others. In this project we had a set of vocabulary that we had to use meaningfully in context, and some of those words made appearances in other course content. One of those words was “likelihood,” which we saw later in our unit on hypothesis testing using likelihood ratios. Having to use these words in a context that made sense helped me better understand the concepts themselves, as I really had to think about what the words meant so I could use them properly, or “meaningfully.” It also showed how many concepts we learn about are related. Given that a lot of these words popped back up in other units, this mini project definitely prepared me for that, ensuring that I understood at least the rudimentary idea behind them before going on to apply them in more complex ways. This connects with mini project 5, where we read an editorial about how we should and shouldn’t be using p-values. The editorial emphasized the idea of being thoughtful and intentional with our analyses and when drawing conclusions, a theme that resonated in the “meaningful story.”

In the third mini project, we simulated confidence intervals with different sample sizes to investigate what happens when assumptions are violated. The main technical aspect of this project, the simulation, ties back into that general theme of the usefulness and importance of simulation in statistics. In this project we were investigating more the properties of confidence intervals, and this relates to the section we covered later on credible intervals for parameters we were looking at using Bayesian statistics. In saying that, this project related to mini project 4, where we came up with credible intervals for a proportion. Having this background information on how confidence intervals actually work (and what we’re actually “confident” in) definitely helped prepare me for interpreting the Bayesian equivalent of the confidence interval, the credible interval. This was also like in mini project 1 where we were examining properties (SE) for different distributions, except in this case it was examining properties for different n/sample size of the same distribution. In general, after doing this project I definitely had a better understanding of why we make certain assumptions, what happens if we don’t fulfill them, and the real meaning when we interpret confidence intervals.

In the fourth mini project, we used Bayesian analysis to try and come up with a distribution to model the probability of winning points in tennis. A large part of this whole unit was working through the transition from the frequentist to Bayesian framework. Like in the rest of our units prior to this one, this was us trying to model a distribution, just in a different way. Whether it was estimation, sampling distributions, or confidence intervals, we had essentially been trying to model true population values of some sort, and this mini project was exactly that too. The “real-world” context of this project also reminded me of our meaningful story from project 2, in that we were using all these words we’ve learned, but now in the context of solving a specific (tennis) problem. Something that stood out to me when learning about the Bayesian framework in general is the idea of informative priors and how they can be subjective. Statistics and math are subjects that I often think of as fairly objective/cut and dry, the tried and true methods always working. But in this case, our end result very much depends on what we choose as a prior, and that’s something different than what I’ve experienced before. It is also interesting as it could introduce personal bias to models, as people may have different perceptions.

In the fifth and final mini project, we read the editorial “Moving to a World Beyond p<0.05.” The main idea was to get readers to re-think how we use p-values. In every statistics course I’ve taken, when testing something for “significance,” we used a p-value and a threshold of 0.05 (typically) to determine this. Tying back to the theme of being meaningful and thoughtful, the editorial also mentions rethinking how confidence might not be the best word for what we refer to as confidence intervals, instead suggesting the world compatibility. Tying this back into our coursework and mini project surrounding confidence intervals, this actually makes sense. We generate these intervals by simulating a specific set of data, so encouraging statistical thinking and thoughtfulness by taking things like data context into account is important. When using confidence intervals for things like difference of means, answering questions like “is there a difference”, we often check if the interval contains 0, and base our conclusions off that, which is another area we could be “thoughtful” in. Overall, this project really instilled in me the importance of being intentional in analysis and understanding what I’m actually doing, which has been a common theme throughout this course.

Over the course of the semester I have definitely learned a lot, both content wise and about what it means to be a good statistician (or anyone who works with data). These mini projects allowed me to take what we learned in class and apply it in a way that both bettered my understanding of these topics, and showed me how they may be used in practice. There were definitely some overarching themes, both technically and otherwise. In general, it showed the power of simulation and how it is behind a lot of statistical tests we use. While what we learn in class often seems very theoretical and derivation heavy, these projects show that there are practical applications (ex. Bayesian). All of the projects and concepts we covered also build off of each other in one way or another, some in more subtle ways, and that’s something that will stick with me as I continue in my academic and professional career. If I don’t know something, I can probably build off of something I do understand. I will also make sure to be be intentional and thoughtful with my work, whether it is related to statistics/data science or not.