1. Dynamic Programming

    Eight months ago, Trey Causey wrote a post about modeling expected points in football, with an emphasis on uncertainty. With my twisted economist's mind, I mentioned that it seemed like dynamic programming could be used in this situation, and indeed it would feature in a future post of Trey ...

  2. Practical Pandas--Part 1

    This is the first post in a series where I'll show how I use pandas on real-world datasets.

    For this post, we'll look at data I collected with Cyclemeter on my daily bike ride to and from school last year. I had to manually start and stop the ...

  3. Tidy Data in Action

    Hadley Whickham wrote a famous paper (for a certain definition of famous) about the importance of tidy data when doing data analysis. I want to talk a bit about that, using an example from a StackOverflow post, with a solution using pandas. The principles of tidy data aren't language ...

  4. Tacking the CPS (part 4)

    As a reminder, the CPS interviews households 8 times over the course of 16 months. They're interviewed for 4 months, take 8 months off, and are interviewed four more times. So if your first interview was in month \(m\), you're also interviewed in months

    $$m + 1, m + 2, m + 3, m + 12, m + 13, m + 14, m + 15$$ ...
  5. Tackling the CPS (Part 3)

    As a reminder, we have a dictionary that looks like

             id  length  start  end
    0    HRHHID      15      1   15
    1   HRMONTH       2     16   17
    2   HRYEAR4       4     18   21
    3  HURESPLI       2     22   23
    4   HUFINAL       3     24   26
             ...     ...    ...  ...

    giving the columns of the raw CPS data files. This post ...

  6. Quiz 10 Review

    Section A01

    This quiz focused on exponential smoothing. Make sure that you know about moving averages and autocorrelation too.


    You needed to find the biggest decline in the time series. You should never have to guess in stats, and I'm worried that some of you just looked at ...

  7. Quiz 9 Review

    Don't forget your section number!

    Section A01


    Remember that for the modified best conservative model, we still care about the significance of all the predictors other than the ones that must be included.


    Quite a few people are still giving point estimates (just \(\hat{y}\)) when the ...

  8. Quiz 8 Review


    The test statistic for \(H_0: \beta_1 = \beta_2 = 0\) is the \(F\) statistic. It's what we'll use for when we're testing multiple parameters at once.

    Several people had \(\beta_1 = 0\) or \(\beta_2 = 0\). This is wrong; it should be and not or. This is actually an important ...

  9. Quiz 6 Review

    Part b asked for a CI for the slope \(\beta_1\). For this one you use the formula \(\hat{\beta_1} \pm t^{\ast}_{n-p-1} SE(\hat{\beta_1})\). \(n\) is the sample size and \(p\) is the number of predictors (1 in this case).

    You get the \(\hat{\beta_1}\) and \(SE(\hat{\beta_1})\) ...

  10. QUiz 5 Review

    Problem 1

    Make sure to read the questions carefully, in particular the underlined or bold parts. For this question we wanted the statistical concept that explains why interpreting a prediction for a car with 0 City MPG is mislaeding. I agree with many of you that a negative Highway MPG ...

« Page 2 / 3 »