positively-semi-definitea random collection of uncertain ideas
https://importdikshit.github.io/
Wed, 17 Jun 2020 17:44:38 +0000Wed, 17 Jun 2020 17:44:38 +0000Jekyll v3.8.7160k+ high school students will only graduate if a statistical model allows them to<p><em><strong>tl;dr</strong> : Due to curriculum disruptions the International Baccalaureate (IB) is going to use a statistical model to assign grades for > 160,000 high school students. These grades have a very significant impact on students’ lives (for better or for worse). This is an inappropriate use of ‘big data’ and a terrible idea for a plethora of different reasons.</em></p>
<p><em>The IB’s model has glaring methodological issues and completely disregards the ethical considerations which should accompany its adoption. The model will inevitably discriminate against groups of students based on gender, race and socioeconomic status. It will unfairly and disproportionately impact vulnerable students. Though this is a mathematical certainty, it is difficult to identify how this will happen a-priori: a model can choose to assign female students systematically lower grades in STEM subjects and/or incorrectly fail Black students at higher rates than Asian students.</em></p>
<p><em>I believe that the IB should seriously re-evaluate the manner in which it has chosen to maneuver this very delicate situation. Its plan highlights a troubling trend of blind faith in ‘big data’ driven by a lack of consideration for any potential ramifications. In this article, I explain why the IB’s decision is horrible, using data from schools in New York to illustrate how a model can discriminate even when it is not given gender, race and socioeconomic data.</em></p>
<h2 id="what-is-the-ib-should-i-grab-my-pitchfork-yet"><strong>What is the IB? Should I grab my pitchfork yet?</strong></h2>
<p>The International Baccalaureate (IB) is an educational foundation that awards high school diplomas to students from around the world. It had 166,000 candidates across 144 countries in the previous graduating class.</p>
<p>The IB has a set of mandatory ‘leaving examinations’ at the end of high school. The marks from these exams are used to allot each student a final grade. This final grade is very important for students: it enables them to graduate, apply to universities and accept admissions offers from universities. It is also the <strong>most</strong> important measurement used in college admissions processes in Europe and Asia (~90% of the weight). A student’s final grade can <strong>significantly</strong> alter their future life outcomes.</p>
<p>The pandemic has led to considerable turmoil in the plans and operations of the IB. Due to curriculum disruptions, the IB has been forced to cancel final exams for its current student cohort. <a href="https://www.ibo.org/news/news-about-ib-schools/the-assessment-and-awarding-model-for-the-diploma-programme-may-2020-session/">Instead, it is opting to assign final grades</a> in a truly unprecedented manner:</p>
<blockquote>
<p>staff are working with an education organisation that specializes in data analysis, standards, assessment and certification. Together we have developed a method that uses data, both historical and from the present session, to arrive at the subject grades for each student.</p>
</blockquote>
<blockquote>
<p>the IB has … been in dialogue with ministries of education, education regulators and other similar bodies across the world to ensure that they too are confident with our approach and that students will be appropriately recognised where required.</p>
</blockquote>
<blockquote>
<p>these assessment arrangements represent the <em>fairest approach</em> we can take for all our students</p>
</blockquote>
<p>Per the IB, the final grades for each student will be assigned by a statistical model as a function of two <em>or more</em> metrics:</p>
<blockquote>
<p><em>Coursework Grades</em>: Grades for projects and assignments that the students submitted prior to the disruptions.</p>
</blockquote>
<blockquote>
<p><em>Predicted Grades (Forecast Grades)</em>: The grade that a teacher believes each student was likely to have obtained if the exams were held as planned. This is a <em>teacher’s evaluation</em> of their student’s preparedness.</p>
</blockquote>
<blockquote>
<p><em>Miscellaneous other data</em>: The IB says that a model will use miscellaneous other data, wherever it is available.</p>
</blockquote>
<p>The <em>three step process,</em> illustrated below, will be used to prescribe <em>final grades</em> to each student<em>.</em> I will refer to this <em>entire process</em> as the ‘model’.</p>
<p align="center">
<img src="https://paper-attachments.dropbox.com/s_64BA1745AF170C7D97EB9C8E43CD5E612ECEE1F2709C45EA1167CF2F29A7B258_1591492130403_IB+PROCESS.png" alt="The three step process to prescribe final grades" />
</p>
<p>There are lots of names for extrapolation based on historical patterns: statistical modelling, machine learning, data analytics, big data, AI. All of these terms refer to the same narrow set of processes which use historical data to predict the outcome of a future event. <a href="https://fairmlbook.org/introduction.html">Unfortunately</a>, as three prominent researchers expressed in their textbook, ‘<em>the fact that [this process] is “evidence-based” by no means ensures that it will lead to accurate, reliable, or fair decisions</em>’. Over the course of this article, I will build on the previous statement and convince you that <strong>it is a horrible idea</strong> to assign final grades using a statistical model.</p>
<p>But <em>definitely</em> read the rest of the article before you grab your pitchfork.</p>
<p align="center">
<img src="http://media.giphy.com/media/26DNioenMF55ocuWY/giphy.gif" />
</p>
<h2 id="glaring-issues-in-the-ibs-methodology"><strong>Glaring issues in the IB’s methodology</strong></h2>
<p>Let’s start with the obvious issues in the process described by the IB. There are at least <strong>seven conspicuous issues</strong> (and some smaller yet equally important ones):</p>
<p>I. <em>Double Jeopardy</em>. If a student did badly on their coursework they will be penalized twice over: once by the model & once by the IB grading rubric which calculates the final grade. This is because the model will predict a final mark based on the coursework and predicted grades. Then the final mark will be combined with coursework grades to obtain a final grade.</p>
<p>II. <em>Historical Bias</em>: <a href="https://cdn.americanprogress.org/wp-content/uploads/2014/10/TeacherExpectations-brief10.8.pdf">A study based on data</a> from the National Center for Education Statistics concluded that secondary school teachers tend to express lower predictions for their ‘expectations from <em>students of color</em> and <em>students from disadvantaged backgrounds</em>’. This is problematic because predicted grades play a prominent role in the model.</p>
<p>III. <em>Different schools, Different errors</em>: Small schools (15% - 30% of all IB schools) will have <strong>bigger</strong> and <strong>more frequent errors</strong> in their model predictions when compared with large schools. This is an example of representation bias.</p>
<p align="center">
<img src="https://paper-attachments.dropbox.com/s_CB00784237A116B42554540B1BBF945CE0D2D4D638B3EABB4D4EF81B35F542E8_1592203527175_file.png" alt="Comparative Error (Explained Below) - Based on 10,000 Simulations" />
</p>
<p>Assume that the largest IB school has a class size of 300 (based on IB <a href="https://www.ibo.org/contentassets/bc850970f4e54b87828f83c7976a4db6/dp-statistical-bulletin-may-2019.pdf">data</a> from 2019). If you are a student in a school with a class size of 5, your final grade will have a <strong>~25% larger average error</strong> than the students in the school with 300 students. The graph above depicts how this comparative error decays with an increasing class size. The principle is simple: the more data you have for a school, the more accurate your predictions are.</p>
<p>IV. <em>Measurement Bias</em>: If the measurement process varies across different schools, it will affect the way a model treats students from different schools. Schools which cater to socioeconomically disadvantaged communities are likely to have less frequent evaluation of students. This will lead to <em>poorer</em> students receiving predicted grades which are less accurate than <em>richer</em> peers in schools with more frequent testing. Additionally, <em>poorer schools</em> are likely to have larger class sizes. A teacher who has to assign predicted grades for 10 students will do a better job than a teacher who has to assign predicted grades to 30 students.</p>
<p>V. <em>Additional data where data is available:</em> The IB has said that they will supplement the coursework grades and predicted grades with additional data ‘where it is available’. This can be problematic as it may induce biases in predictions and will cause predictions for some schools to be more accurate than predictions for others.</p>
<p>VI. <em>Skewed Distributions</em>: Schools with a non-normal distribution of grades will have bad predictions. If a school has a left-skewed distribution (<em>overachievers</em>!) of grades or right-skewed distribution of grades, a model will perform worse for its students.</p>
<p>VII. <em>Distribution Shifts</em>: If the subject teacher in a school changed between last year’s cohort and this year’s cohort, the <em>historical relationship</em> between their predicted grades and final grades will not match the <em>current relationship</em>. This may lead to systematically worse predictions.</p>
<p>These are the low hanging fruit. There are many other nuanced problems which may arise depending on the sort of model that the IB decides to use. I personally feel that this smattering of amuse-bouche arguments should be enough to terminate this experiment. Read on for concepts which are only marginally more complex but much more unsettling.</p>
<h2 id="a-students-future-shouldnt-be-at-the-mercy-of-random-noise-in-a-statistical-model"><strong>A student’s future shouldn’t be at the mercy of random noise in a statistical model</strong></h2>
<p>There is a ubiquitous saying in the field of statistics that ‘all models are wrong and some models are useful.’ A model is an approximation of reality based on empirical historical patterns: all predictions are <strong>rough</strong> estimates and <strong>no</strong> model can forecast the future with complete certainty. Furthermore, every model will have some uncertainty in its predictions due to random noise. If the IB uses a model to assign final grades for students, the model will <strong>inevitably</strong> make mistakes.</p>
<p>Let’s assume that the IB builds a model which is ‘90% accurate’. This is an almost unrealistically ambitious target and is <strong>incredibly difficult to achieve in practice</strong>. This would mean that the IB will predict incorrect final grades for at least 1 out of every 10 students. Put another way: this is equal to issuing incorrect grades for all IB students in China, Germany, India, Singapore and the United Kingdom <strong>combined</strong>.</p>
<p>The IB may be comfortable with this 10% inaccuracy because they have assured students that they will ‘match the grade distribution from last year’. Will this cancel out the inaccuracies in the model predictions? <strong>Absolutely Not</strong>. This is a cosmetic improvement which has the additional benefit of providing the IB with some plausible deniability. While the IB can make the current distribution look like the historical distribution, they <strong>cannot</strong> possibly guarantee that students will be in the same neighborhood on the current distribution as they would have been if the exams proceeded per usual.</p>
<p>Consider this: I can build a model which assigns every student some marks at random from an arbitrary normal distribution. Then, I can use these marks and adjust my grade bins to match the final grade distribution from last year. Does this compensate for the fact that my ‘model’ assigns marks in a manner completely untethered from reality? <strong>No it does not</strong>. This previous example was pathological, in reality, a bad model will fail silently and disproportionately harm certain groups of students.</p>
<p>Yes, models are very useful tools which help us make decisions at scale. They will inevitably play a large part in our future. This does mean that institutions should be able to circumvent the ethical considerations of where a model is appropriate. Is it ethical to deprive a student of their hard-earned spot at the London School of Economics because a black-box decision making mechanism said that they were not worthy of the opportunity? Is it ethical to tell a 17 year old kid that they were unable to graduate with their peers because their inaccurate prediction was ‘the cost of doing business’? I believe there are key ethical considerations which the IB has possibly overlooked while making its operational choices.</p>
<h2 id="why-do-models-fall-short"><strong>Why do models fall short?</strong></h2>
<p align="center">
<img src="http://media.giphy.com/media/I5XmvxrOKgbTy/giphy.gif" alt="Lotta Puns" />
</p>
<p>Before we jump into the last argument, we need to understand how models which predict correctly may still be fully incorrect. Models are very powerful yet incredibly stupid: the exact quality which makes them desirable also renders them completely ineffective. They have an uncanny ability to detect microscopic patterns in humongous quantities of data. The job of the model is to pick up any pattern which will allow it to predict an outcome efficiently. A researcher cannot control what patterns a model will choose to detect. Therefore, models will take any edge to make themselves more predictive - even if it means capitalizing on false relationships to predict outcomes.</p>
<p>Take these this graph below as an example:</p>
<p align="center">
<img src="https://paper-attachments.dropbox.com/s_CB00784237A116B42554540B1BBF945CE0D2D4D638B3EABB4D4EF81B35F542E8_1592167749315_file.png" alt="Data from tylervigen.com" />
</p>
<p>I trained a model which has an almost perfect fit to the data above. Given the blue line, it can predict the value of the red line very well (and vice-versa). Now, let me re-contextualize this data.</p>
<p align="center">
<img src="https://paper-attachments.dropbox.com/s_CB00784237A116B42554540B1BBF945CE0D2D4D638B3EABB4D4EF81B35F542E8_1592168496031_file.png" alt="Data from tylervigen.com" />
</p>
<p>The red quantity was actually the money spent on pets in the USA and the blue quantity was the number of lawyers in California. We know that there is no possible way that the two quantities above are related - this is just a coincidence. Yet, any model will learn to predict one as a function of the other (just as mine did). Do you think that a sudden downturn in the money spent on pets in 2010 would have led to a commensurate downturn in the number of lawyers in California? No. But if I trained a model on this data, it would be susceptible to this incorrect assumption. Even when I try to correct this inconsistency by giving my model an appropriate quantity which it <em>should</em> be able to use to predict the number of Lawyers, it chooses to use the incorrect signal (money spent on pets) to predict the outcome instead.</p>
<p>As a researcher, it is <strong>not possible</strong> to stop the model from learning these incorrect relationships. This is an important point: <strong>just because a model is predictive does not mean that it is correct</strong>. An accurate model may be a bad model and spurious correlations can be <a href="http://people.dbmi.columbia.edu/noemie/papers/15kdd.pdf">very problematic</a> if not appropriately detected.</p>
<h2 id="a-model-will-discriminate-against-students-based-on-gender-race-socioeconomic-status-etc"><strong>A model will discriminate against students based on Gender, Race, Socioeconomic status etc.</strong></h2>
<p><em><strong>tl;dr</strong>: Models will learn the gender, race, and socioeconomic status of a student even if this information is withheld from them. It is a statistical inevitability that the IB model will discriminate against certain groups of students, yielding unfair outcomes.</em></p>
<p>You may think that a model which isn’t aware of gender/race/socioeconomic status cannot possibly discriminate based on these attributes. This line of thinking is called ‘fairness through unawareness’. Let’s see what the <a href="https://fairmlbook.org/">experts say about this</a>:</p>
<blockquote>
<p>some have hoped that removing or ignoring sensitive attributes … somehow ensure[s] impartiality … unfortunately, <em>this practice is usually somewhere on the spectrum between ineffective and harmful</em>.</p>
</blockquote>
<p>Even if you don’t include a sensitive attribute in a model, the model will learn it. Fairness through unawareness reminds me of the Chappelle Show <a href="https://www.youtube.com/watch?v=XGU8lvSRxv8">educated guess line</a> skit. Dave first guesses the race of the caller, then uses that to make a guess about the circumstances of the caller. Dave is ‘a model’ which is making malicious assumptions about his clients.</p>
<p>Now, let’s see this in action. I will build a model to predict whether the graduation rate for a <a href="https://data.nysed.gov/downloads.php">New York High School</a> is above or below the national average, as a function of the high school’s county location and its score on 3 different tests. This is <strong>incredibly similar</strong> to what the IB is doing (they are using analogous metrics).</p>
<p>My model can predict whether the graduation rate is above/below the national average with almost 80% accuracy. Remember, this model does not have any data about the race, socioeconomic status or gender of the student body in any high school.</p>
<p align="center">
<img src="https://paper-attachments.dropbox.com/s_CB00784237A116B42554540B1BBF945CE0D2D4D638B3EABB4D4EF81B35F542E8_1592209446193_file.png" />
</p>
<p>Let’s take a closer look at <strong>how</strong> the model learns to predict the graduation rate. I’m going to gloss over some technical details here, but I simply ask the <strong>same model as above</strong>: are a majority of the students in the High School Black/Hispanic? The big idea is that if our model can detect the majority Black/Hispanic high schools with high accuracy, it is probably learning to identify Black/Hispanic high schools and <strong>then using this fact to predict</strong> the graduation rate.</p>
<p align="center">
<img src="https://paper-attachments.dropbox.com/s_CB00784237A116B42554540B1BBF945CE0D2D4D638B3EABB4D4EF81B35F542E8_1592209421208_file.png" />
</p>
<p>Yikes. Our model can predict the <em>majority race</em> of the high school <strong>with more accuracy</strong> than it can predict the <em>graduation rate</em>. This means that our model is definitely race-aware. We did not <em>expect it to learn anything about</em> the race of the students, yet, the model decided that it needed to know this information in order to predict graduation rates. We did <em>not provide the model with any data on</em> the race of the students, the model just went ahead and learned it.</p>
<p>If we look for the same pattern with socioeconomic status: we see that the model recognizes which schools have a majority of economically disadvantaged students. It can detect these schools with ~75% accuracy. This means that our model is economically-aware as well.</p>
<p align="center">
<img src="https://paper-attachments.dropbox.com/s_CB00784237A116B42554540B1BBF945CE0D2D4D638B3EABB4D4EF81B35F542E8_1592209142500_file.png" />
</p>
<p>To make sure that this is not a fluke, we check if the model can detect schools which have a majority female population:</p>
<p align="center">
<img src="https://paper-attachments.dropbox.com/s_CB00784237A116B42554540B1BBF945CE0D2D4D638B3EABB4D4EF81B35F542E8_1592209236395_file.png" />
</p>
<p>The model is less than 50% accurate. This is slightly worse than guessing at random. The model is clearly not taking the majority gender of the school into account while making its decision.</p>
<p>In fact, if we build an alternate model but give it the majority race of the high school (in addition to the same data-points as the original model), we would expect the graduation rate accuracy of the model to go up substantially as we are providing it with additional data. However, we see that the accuracy of the <em>alternate model</em> only increases by ~1%. This is another indicator that our model is somehow race-aware already.</p>
<p>To reiterate: when we built a model to predict high school graduation rates based on test scores and school location, we did not give the model any information about race, socioeconomic status, or gender. Our model simply realized that if it can identify the racial/economic makeup of the school, it can probably identify graduation rates. Therefore, <strong>even if the IB does not give the model sensitive data, a model will deduce it</strong>.</p>
<p>What happens when a model learns about the gender/socioeconomic status/race of a student and then incorrectly uses these things to predict their final marks? Much like the ‘amount of money spent on pets’ does not actually tell us anything about the ‘number of lawyers in California’, knowing the race/socioeconomic status/gender of a student does not tell us anything about their final marks - even though this data may be useful for prediction. <a href="https://medium.com/@mrtz/how-big-data-is-unfair-9aa544d739de">Since we do not have</a> a ‘<em>principled way to tell at which point such a relationship is worrisome and in what cases it is acceptable</em>’, using a model will lead to unfair predictions for groups of students.</p>
<p>A formative result in machine learning states that it is statistically impossible for the IB to ensure that any model will be completely fair. There are three primary criteria which guarantee fair predictions, however it is impossible for any model to satisfy all three of them simultaneously. This means that any model used by the IB will inevitably discriminate against students in <strong>two of three</strong> ways:</p>
<blockquote>
<p>I. It may penalize students based on some <em>sensitive attribute</em> and assign systematically lower grades to certain groups.
<em>For example</em>: The model may choose to assign grades based on the gender
(or race/socioeconomic status) of the student. It may assign female mathematics students systematically lower grades than male mathematics students.</p>
</blockquote>
<blockquote>
<p>II. It may have <strong>a higher rate</strong> of incorrect predictions for certain students based on some <em>sensitive attribute</em>.
<em>For example</em>: The model may incorrectly fail students (devastating but unavoidable!) at different rates based on the race (or gender/socioeconomic status) of the student. It may <strong>incorrectly fail</strong> <em>Black</em> students at a higher rate than <em>Asian</em> students.</p>
</blockquote>
<blockquote>
<p>III. It may have a <em>lower rate of precision</em> for certain groups based on some sensitive attribute.
<em>For Example</em>: A model may have varying precision in identifying students who should fail based on their socioeconomic status (or gender/race). It may be more precise in <em>failing rich students</em> (removed with a scalpel) and less precise in failing poor students (removed with a butter knife).</p>
</blockquote>
<p>As mentioned above, <strong>two of these three scenarios are unavoidable</strong> per <em>sensitive attribute</em>. These disparities will express themselves in any model chosen by the IB. There is an unfortunate trade-off to be made over here: the researchers will have to decide which of the two criteria they will sacrifice.</p>
<p>You must ask yourself: is it ethical to use a model which is systematically pessimistic about the performance of some students but unfairly optimistic about the performance of others? Is it fair to use a model if it is capitalizing on sensitive attributes to assign grades?</p>
<h2 id="well-what-do-we-do-now"><strong>Well, what do we do now?</strong></h2>
<p>I’ve spent time discussing this problem with people who are a lot smarter than I am. We weren’t able to arrive at a good answer: just a long list of wrong answers. The fact that this is an outsourced black-box model with limited historical data, no oversight into the decision making mechanism and only 3 months for research and production further complicates the situation. Data analysis and machine learning are incredibly powerful tools, but they need to be used in appropriate situations and with a great degree of care. When you can significantly alter life outcomes for vulnerable parts of the population, you need to adopt a higher degree of nuance in your decision making. This situations elicits a <em>process solution</em> rather than a smart <em>‘modelling’</em> solution.</p>
<p>Unfortunately, I do not know enough about the domain of education to suggest a good alternative. I am, however, confident enough in my mathematical ability to identify that the current solution is unequivocally incorrect. It is better to have a delayed and suboptimal solution than a discriminatory and error-laden one. I am conscious of the fact that this may cause the article to appear slightly partisan but my goal is to simply raise awareness about a potentially massive issue in a very sensitive situation.</p>
<p>As a student/parent/teacher: the first step is to not succumb to <em>outcome bias</em>. It is easy to say ‘let me see the results of the model and then I will decide if I like it.’ If you feel that you were unfairly treated <em>after</em> the results are legitimized, there is very little you will be able to do about it. Share this article with your administrators and with others in the IB community. Demand a better, fairer and more transparent process (<a href="mailto:support@ibo.org">support@ibo.org</a>). Raise your concerns with the University which you wish to enroll in and ask them about fairer alternate admissions processes (<a href="mailto:info@officeforstudents.org.uk">info@officeforstudents.org.uk</a>, etc.).</p>
<p>On a closing note, I would like to call attention to the speed with which governments and universities declared that they were ‘on board’ with this bad idea. It is concerning to see how this was treated as a stereotypical business decision with no regard for the potential ramifications and a total lack of consideration from an interdisciplinary perspective. Conversations on fairness in machine learning clearly failed to reach the many many stakeholders involved in this operation. Perhaps there is some further scope for outreach within this corner of academia?</p>
<p align="center">
<img src="http://media.giphy.com/media/4bpK2k0Yru5Us/giphy.gif" />
</p>
<p>Thank you for reading. I always appreciate any constructive criticism or comments!</p>
<hr />
<p>Acknowledgements: A big thank you to Ilian, Rohan, Mishti, Vasilis and Mom+Dad for reviewing drafts of my first article ever!</p>
Wed, 17 Jun 2020 00:00:00 +0000
https://importdikshit.github.io/2020/06/160k-students.html
https://importdikshit.github.io/2020/06/160k-students.htmltl;dr mit 6.s085<p>a friend mentioned that this course is a good review of applied statistics. since it was pretty short, i decided to read through last weekend. it was a good recap of the basics. i’ve summarized the big chapter-wise tl;drs here. the case studies were very well picked. i would recommend reading through the examples in the lecture notes.</p>
<blockquote>
<p>basic statistics</p>
</blockquote>
<p>ask the right questions. be ultra aware of what you are answering. fake polling data - asking the right questions is more important than using the most powerful tests. examining even/odd parity is not a very conspicuous problem statement.</p>
<p>base rate fallacy via <a href="https://xkcd.com/1132/">xkcd</a>: when the false positive rate is high and the true positive rate is very very low, your test should not have an error threshold which is so high for establishing significance.</p>
<p>statistics fails silently - understand what you are doing in a step by step manner.</p>
<p>complex effects: important to check for them by <em>visualizing</em> as analysis will require more sophisticated models. we need to watch out for <strong>multimodal data</strong> - means are not very accurate estimates. we also need to watch for skew visualize the data: anscombe’s quartet is the perfect counterexample to using metrics without fully understanding their importance in context (all the graphs have the same <script type="math/tex">R^2</script>).</p>
<p>aggregate data vs individual data: immigrants population and literacy rates by state are correlated on the aggregate level. does this me an immigrants cause higher literacy rates? no. immigrants choose to settle in states with high literacy rates. individual level data shows that immigrants were in fact less likely to be literate. <strong><em>be ultra aware of which question you are answering</em></strong>. john kerry vs bush by gelman. more rich voted bush and more poor voted kerry <strong>but</strong> kerry won the rich states and bush won the poor states. all the rich people in the poor states voted bush. the few rich folks who voted kerry - did so in the rich states. this led to a false perception because poor states had very strong income-linked voting preferences.</p>
<p>student t distribution: gaussian random variable with an uncertain standard deviation. for a standard normal Z and U distributed as chi squared with r degrees of freedom, a random variable <script type="math/tex">t = \frac{Z}{/sqrt{U/r}}</script> has a student t distribution. the uncertainty in the denominator allows you to pad your tails appropriately and allot higher probabilities to seemingly rare events when you have small samples.</p>
<p>thinking about independence: knowing something about feature I gives us information about feature II which you would not have had before.</p>
<p>correction factor for the sample variance: the sample mean is a sort of ‘optimum’ with respect to our sample and therefore we will always underestimate the dispersion of the data. since there is some uncertainty about where the true mean is, our squared terms will always be smaller than they should be. we can adjust for this by inflating the sum with an n-1 correction in the denominator and we get an unbiased estimator.</p>
<blockquote>
<p>uncertainty</p>
</blockquote>
<p>given some estimated value of a parameter, the confidence interval is the range of values (based on the sampling distribution) which is likely to contain the true value with a probability of <script type="math/tex">1 - \alpha</script>. the fixed parameter will be within our confidence interval roughly <script type="math/tex">1 - \alpha</script> % of the time. the uncertainty about the parameter stems from the fact that our estimate is based on a sample: to decrease uncertainty - get more samples! we dont know where our parameter is within the confidence interval. a good way of getting a confidence interval (when you have no idea what the sampling distribution looks like) is to use the bootstrap. basic math-stat 101.</p>
<p>we hypothesize that the underlying parameter value is something and then check the chance (under the null distribution) of generating the data which you observe by making our assumption. if you are very unlikely to see this data then you can reject your hypothesis. the p-value is simply the chance - under the null distribution - of observing data as extreme as you saw in your experiment. a type I error is the probability of rejecting the null hypothesis incorrectly (<em>false positive</em>). a type II error is the failure to reject the null hypothesis even though it is false (<em>false negative</em>). our test was not designed well enough to catch the actual effect. permutation tests are particularly useful for this. baby’s math-stat class recap.</p>
<p>the power of a test is {1 - probability of a false negative}. this is the probability that you correctly reject the null hypothesis (the probability of getting a true positive). to measure the power: we need a null distribution, a presumed alternate distribution and a chosen significance level. for the power of a test, we need to check the overlap of the distributions up to the point where you reject the null. this is actually useful - choose the smallest possible effect and see if your experiment has the power to detect it. also, if you know your desired significance level and have a hypothesized and alternative value then you can determine the sample size n which will get you the statistical power that you desire from your test. diagram from the course:</p>
<p align="center">
<img src="https://paper-attachments.dropbox.com/s_9F7B51EB8FCF28C9B846B86CF7007D211E7879B2A6A49EB1BFE2169AE9C5E6B2_1589947339337_Screen+Shot+2020-05-20+at+00.02.11.png" alt="mit hypothesis basic" />
</p>
<p>rejection of the null tells us that there is something systematic which violated your null hypothesis assumption: it does not tell you what that systematic cause. detecting confounding by thinking about forms of bias in your data is important - the statin study wherein the recipients of the drug were older on average.</p>
<p>the funnel diagram shows the value which a clinic needs to have in terms of the success rate to beat the given threshold of the test. fairly easy to arrive at this number = 32 plus/minus significance level*uncertainty/root(sample size). with larger n you can detect smaller changes. this matters because an underpowered study could have absolutely terrible results (even though you ran the appropriate hypothesis test). you need to understand the power of your study before arriving at any conclusions.</p>
<p>t tests along with a bunch of variance calculations/assumptions can be used to assess the difference in means for two samples. permutations tests achieve roughly the same effect with a lot less hoo-ha. how does the power of these tests differ? i know that permutation tests are actually quite powerful.</p>
<p>multiple comparisons, bonferroni corrections, false discovery rates: <a href="https://xkcd.com/882/">relevant xkcd</a>. conducting 20 significance tests at a level of 0.05 would mean that one of them is possibly positive due to chance. casi is a fantastic resource for this topic.</p>
<p>you can run tests with dependent samples, but you must be aware of whether your data is independent (basic sanity check - lag plot). you will need to identify if you are dealing with dependent samples and you will need to adjust for it appropriately.</p>
<blockquote>
<p>nonparametric statistics</p>
</blockquote>
<p>again, most of this is a recap of things i am familiar with, so i’m using a light touch.</p>
<p>you want to use nonparametric tests when you have skewed and highly variable data, when you cannot make assumptions about the data generating process, and when your statistic has difficult to calculate sampling distributions.</p>
<p>the right parametric test is probably more powerful than a nonparametric test. also a quick sanity check about your assumptions would be to run both a nonparametric and a parametric test side by side and then see if the results are roughly congruent with each-other.</p>
<p>a bunch of nonparametric tests: kolmogorov-smirnoff, shapiro-wilk, anderson etc can be used to check distributional assumptions. unfortunately, these assume independence as well. wilcoxon’s signed-rank test can be used on paired data in order to compare the medians of two groups. mann-whitney (takes you back to the baby inference classes) can test if one group systematically larger or smaller values than the other group. there are well established sampling distributions for the statistics mentioned above.</p>
<p>using the data to generate a null distribution: so you are trying to examine the probability that something anomalous is happening in the schools. you do not have a sampling distribution for a complicated statistic so you treat the 50th to 75th percentile of the two distributions to obtain a null distribution for the correlation between the two metrics. this is a fantastic use of resampling.</p>
<p>permutation testing in the case of test answers: fantastic idea.</p>
<p>permutation testing: generally reserved for comparing more complicated statistics across a bunch of different groups. if there is no systematic difference then the statistic should be roughly similar upon permuting the labels. if you observe a distribution over all (or many) possible permutations than you have a good approximation to the null distribution, you can use this null distribution to arrive at a fairly accurate empirical p value. <em>stupid idea - try and perturb your group b to understand the power of your test (for example adding 0.01 to each group a in a comparison of means test).</em></p>
<p>permutation tests to check for trends in time series data: fit trends, permute the data. check the probability of the original trend. this uses permutation tests on non-exchangeable data-points.</p>
<p>bootstrap aka confidence intervals for complicated statistics: treat your original sample as the overall population and sample 10000 batches of data of size n with replacement from the population. the overall bias and variance and characteristics of this simulated distribution is equal to that of the theoretical sampling distribution per the tibshirani monograph. you can get a rough confidence intervals via the percentiles of your simulated data. you can use the bc_a bootstrap in order to correct for skew, etc. kind of analogous to cross validation.</p>
<p>model selection via cross validation: basic model selection recap. use the validation set to choose the best values of your parameters. <strong>do not unwrap the test set until you are ready to 86 the model if things go wrong</strong>. the test set is sacred and the test set should not be unwrapped until at the last step. you can use a bootstrap analogue for your cross-validation. subsample the dataset at random K different times. then divide into training and testing data and collect all your metrics.</p>
<p>penalizing complexity - address tradeoff between complexity and error by directly dealing complexity. express the badness of the model as sum of error term and penalty term for model complexity. this is a form of regularization. aic is regularization by adding a penalty of 2 * the number of parameters. bic adds a penalty of number of parameters * log number of data points. this is an interesting contrast to simply penalizing the parameters. mdl is an information theoretic penalty which compresses the model and error. it attempts to strike a balance between a big model and high error.</p>
<p>bayesian occams razor: rests on the principle that models with fewer outcomes have an individually higher probability of on each plausible outcome. simpler model is usually right.</p>
<blockquote>
<p>experimental design</p>
</blockquote>
<p>replication is key! your results need to be independently verified sometimes and the basic logic is that as samples increase, your results become more consolidated. again, you could have mistakenly rejected the null hypothesis and independent replications will help confirm this.</p>
<p>baselines are important: what would happen in the absence of the effect. the effect needs to cause a difference but it needs to be measured relative to some reference value. generally a placebo + control + effect structure is important in randomized studies. placebo’s are sometimes able to simulate the treatment by themselves. the hawethorne effect {if you tell the subjects you are studying their responses this might bias outcomes from the outset. also the perception of an effect can warrant a change in the outcome} and the school district example.</p>
<p>if there are external sources of variability you must identify them and adopt a blocked design by dividing your data into further groups based on the level of the confounding factor. then examine your hypothesis within these ‘controlled groups’. block the confounding factor and analyze the blocks separately. make sure to watch for multiple comparisons.</p>
<p>randomization: your data points should both be randomly selected and randomly assigned to groups. the lack of randomization can lead to bias in your analysis. also randomization needs to be done appropriately - dont speak to a bunch of people in a plaza ‘randomly’ and then say that you sampled at random. that is incorrect.</p>
<p>george box: <strong>block what you can, randomize what you cannot</strong>.</p>
<p>gathering data: drawing from the representative population. multiple different strategies.</p>
<p>srs: draw at random without replacement. not independent samples but when the population size is large we can treat this as independent. simple random samples often overlook significant subpopulations if they are small in the population - for example: rare diseases.</p>
<p>stratified random sample: divide the population into non-overlapping groups with small variation and then sample proportionally from them. neymans optimal allocation would take both size and the variation into account: for a subpopulation l with a W_l proportion and sigma_l variation then we sample from group l in proportion to <script type="math/tex">W_l\sigma_l</script> .</p>
<p>cluster sampling: divide the population into heterogeneous smaller groups which are well representative of the overall population. sample a few groups and then sample at random from within each group. so city → city blocks → sample from city block. this would be problematic if we do not account for socioeconomic indicators in zip codes in nyc for example.</p>
<p>unbiased list of subjects to sample from is problematic. non response bias because rates can be systematically different for different groups. frame the question neutrally to not evoke a strong response + randomize ordering of questions - framing can affect the response.</p>
<p>sample size doesn’t matter if your population isn’t selected well. the literary digest chose a systematically biased population to sample from (rich people) and then had 2.4 million responses but all worth nothing because non-representative of underlying population.</p>
<p>in an srs our samples are not independent. this leads to an finite population correction on our variance of the mean which is approximately equal to 1 when the sample size is small in comparison to the large overall population size. for a sufficiently large sample, our srs case has a sample average which is approximately normal. this is not due to the clt - which requires iid samples - we employ a separate theorem which tolerates dependence between random variables.</p>
<p>its best to have paired data while working with treatments. this allows for greater power of the statistical test. repeated measures design allows us to serve multiple treatments to the same person in identical settings. so back to back different treatment - therefore each person effectively serves as their own control. it is important to randomize the order of the treatment and consider the temporal effects. you should consider caffeine withdrawal for example. finals weeks etc. therefore you need identical settings. also temporal effects can be model with autocorrelation models.</p>
<p>so we will need to account for multiple factors which we will need to block on: left/right handed MEN and WOMEN. so four groups to block on. if we dont have enough data to replicate the experiment within each sub block we will need to assign different sub blocks to differet treatments.</p>
<p>the simplest example of this design can be seen in a school/semester setting.
we use a latin squares setting in order to assign the treatment, control and placebo to different locations at different times in order to control for the effect of the semester: all experimental conditions are tested.</p>
<p>fortunately i dont have to deal with much experimental design because it seems hard - <strong>but incredibly exciting</strong>.</p>
<blockquote>
<p>linear regression</p>
</blockquote>
<p>very light touch on this section. i’d like to believe i know this well already.</p>
<p>anscombe’s quartet again: four regression plots with equivalent point estimates - the <script type="math/tex">R^2</script> for each model is the same. some models are blatantly incorrect.</p>
<p>we assume a model on our noise and we assume that our responses can be expressed as a linear function of the <script type="math/tex">y_i</script> . our estimates for the slope and intercept are <script type="math/tex">\beta_1 = corr(x,y)sd(y)/sd(x), \beta_0 = \bar{y} - \bar{x}\beta_1</script> respectively. our slope is higher as the correlation is greater and as the spread of the y increases. the slope is greater as the spread of our x variable decreases - as there is a lot more variability introduced into the picture. our intercept is there to align the line as it passes through the y axis.</p>
<p>if there isn’t a high correlation you will be sool because your model is badly specified. the correlation looks at a z-scored data point and the covariance between them. because i was brought this up a few days ago: correlation cannot be greater than 1 (in absolute value) because of the the fact that cos(x) is bounded by 1 or this can be seen through the cauchy-schwartz inequality. for this you will need to imagine your vectors as n dimensional standardized vectors.</p>
<p>a strong correlation could be indicative of many things:</p>
<ol>
<li>causality - but you dont know in which direction</li>
<li>hidden cause - some z causes both x and y. knowing x does give you information about y but not a causal relationship.</li>
<li>confounding - some other variable (z) influences y and possibly even more strongly than x. we dont know z therefore our conclusions are misspecified in the absence of the confounding factor.</li>
<li>coincidence - most frustratingly. sunspots and congress.</li>
</ol>
<p>wald’s tests for the coefficients. this is very formulaic. standard error for the model predictions via basic sampling intervals. an interesting plot was the test statistic for the correlation coefficient differentiated by the number of data points used to construct the sampling distribution. the sample correlation needs to be truly large in order to assume large values of the t statistic. the blue curve is for 10 data points versus the green curve for 100 data points.</p>
<p align="center">
<img src="https://paper-attachments.dropbox.com/s_9F7B51EB8FCF28C9B846B86CF7007D211E7879B2A6A49EB1BFE2169AE9C5E6B2_1590281773655_Screen+Shot+2020-05-23+at+20.56.09.png" alt="mit t stat" />
</p>
<p>extrapolation is not a very fantastic idea for linear models or models over space and time. extrapolation will often lead to false and misleading predictions.</p>
<p>dont fall into the trap of multiple comparisons. realize that at a significance level of alpha and around n experiments, a total of <script type="math/tex">n\alpha</script> experiments will show statistically significant outcomes just by chance.</p>
<p>not familiar with this experimental pipeline: try checking if your model explains a significant portion of the variability via an anova test - if your anova is promising then you go ahead and starting testing your individual variables.</p>
<p>check for effect size via the standardized coefficients in order to compare apples to apples. this will allow you to circumvent issues of disparate scale.</p>
<p>the model can be viewed through lens of the r squared statistic: intuitively it is the proportion of the variance in the data which can be explained by the model. you can decompose the variance in your response as follows: variance in the response variable = mean of the squared residuals + mean of squared difference between predicted values and then mean of the response.</p>
<p>the f statistic is very similar: it is the measure of the variability in the data due to the model vs due to random error. is calculated as the ratio of the sum of squares of the model and the sum of squared errors and we divide through by degrees of freedom in order to obtain the correct chi squared distribution. yadda yadda this is mechanistic calculation nonsense.</p>
<blockquote>
<p>more regression</p>
</blockquote>
<p>very light touch on this section. i’d like to believe i know this well already.</p>
<p>residual analysis is is very important. if the model is valid the residuals should look roughly like the parametric assumptions you made on your noise. remember that noise is not equal residuals but it is related to the residuals through the standard <script type="math/tex">(I - H)\epsilon</script> formula. the residuals are slightly correlated with one another - due to the missing degrees of freedom.</p>
<p>standardized residuals are uncorrelated and have the same distribution as assumed on the noise. dividing by the estimated variance of the noise gives us a chi squared denominator yields studentized residuals. analyze studentized or standardized residuals as opposed to the raw residuals. this is an interesting default? should you standardize residuals from the get go?</p>
<p>outlier - something far away from the rest of the data. investigate this. leverage - the influence of the point described by how far away it is in the x direction. closer to the conditional mean results in a lower leverage due to a weaker effect on the conditional mean. influential point - high leverage points which significantly affects slope.</p>
<p>you need to determine which one of the two factors is at play: are you systematically missing some data in a particular region or are you just dealing with some irrelevant outliers which you can address and remove from your analysis.</p>
<p>leverage is formally defined as the corresponding diagonal element of the hat matrix. this is the rate of change of the ith prediction with respect to the ith model response. the further the deviation - the larger the element and the more it influences the overall estimator coefficients.</p>
<p>cook’s distance - how much would your predictions deviate once you remove a particular point from the data set and it calculates a metric to capture this. this can also be recomputed in terms of the leverage of the data points and the residual using some fancy algebra.</p>
<p>robust regression aka special loss functions: median estimator - aka laplacian noise aka lad - changing the value of a point just a little could have a very great impact on the model because of the behavior of the loss function near 0. huber loss - lad but differentiable. essentially turns it into an easier optimization problem by relaxing the loss function using cauchy noise. bisquare loss - is like mse but removes loss on outliers in your analysis. this does not make much sense - because you can simply remove the outliers and use mse in which case. these would change your</p>
<p>heavier tailed distributions assume that you can have a reasonable probability for unlikely events which means that outliers should be a little more likely in these models. loss functions are tailored to be able to acommodate for these deviations in a less harsh manner.</p>
<p>ransac model: data is mostly inliers - pick a subset of point and computer a model based on it. any other points which reliably fit this model are added to the inliers. compute the error. repeat for a bunch of different subsets and then pick the model with the smallest error. this kind of just feels like you are cherry picking your training set to find the subset which best suits your model class.</p>
<p>ridge regression by regularizing the parameters: lambda = 0 leads to the old solution. ridge doesn’t favor sparsity - ridge favors smaller overall effect sizes. it will choose (1, 1, 1) over (0,0,3). several similar effects. ridge is an map framework of thinking about regression via a prior gaussian on the parameters with 0 mean and <script type="math/tex">1/2\lambda^2</script> variance.</p>
<p>sparse solutions can be achieved through lasso. we want something like a hamming loss for if a coefficient is activated but that is an optimization problem over a discrete space. instead we choose to have a penalty for any nonzero parameter assuming that cumulatively the effects should not be simultaneously large and many in number. this is the same as a map framework using a laplacian prior over our parameter space. sampling distributions, confidence intervals and hypothesis tests for lasso are obtained via nonparametric statistics. this is a convex problem and we can find a tractable solution.</p>
<p>generalized linear model:s inverse non linear link function which captures special prediction properties. we assume that the model is some nonlinear transformation of a linear model. glms allow us to exercise more freedom in the relationship by allowing us to fully specify the distribution of response. generally cannot be solved in a closed form.</p>
<blockquote>
<p>categorical data</p>
</blockquote>
<p>this section is pretty comparatively new: stuff i haven’t dealt with extensively.</p>
<p>categorical data without a natural ordering: can be expressed in the form of a contingency table. numeric counts for each category across the n X m input boxes.</p>
<p align="center">
<img src="https://paper-attachments.dropbox.com/s_51FEC7D264A0A51DEF0160F988381A939052E428241F854E95A169227A10C6E5_1589942762983_Screen+Shot+2020-05-19+at+22.45.58.png" alt="mit treat vs outcome" />
</p>
<p>risk of outcome i for treatment j: empirical conditional probability of outcome i given treatment j. <em>relative risk</em> of outcome i for treatment j, k is the {risk of outcome i, treatment i}/{risk of outcome i, treatment k}. odds ratio for two outcomes/treatments is the simple ratio {outcome i, ti/outcome j, ti}/{outcome i, tj/outcome j, tj}. this compares the odds of the two outcomes across two treatments. we are interested in the <strong>size of an effect as well as its significance</strong>, using a <strong>confidence interval around the odds ratio</strong> can help us capture these two things simultaneously.</p>
<p>simpsons paradox: confounding factors are very important for categorical data. when you aggregate trends they might seem the reverse of what they actually are due to the lack of a confounding variable. titanic example of how class survival rates is confounded with gender. since women and children were evacuate first - and more of them were in first and higher classes - the first/higher classes had a higher survival rate than the other classes. classic berkeley grad school admissions example. gender was confounded with selectiveness of department. hospital example in the sheet.</p>
<p>chi squared significance test: independent data points, large enough values in each entry. get the data, do some eda, check for confounding factors. then, the test can see if your treatment influences our outcome OR if they are independent of one-another. test statistic is <script type="math/tex">\sum (obs-exp)^2/exp \sim \chi^2</script>. you can see how this is sort of supposed to be chi squared distributed, each data point is independently collected, each term in the summation resembles the square of a standardized gaussian random variable. we sum up across r rows and c columns, however, since we have an almost fully determined system (except the last row and column), this is a chi square distribution with (r-1)(c-1) degrees of freedom. you calculate your p value for the obtained statistic.</p>
<p>expected counts: you get a table with xi/N of the total per row and yi/N of the total per column. you simply formulate a total for each cell as <script type="math/tex">x_iy_i/N</script>. this is the table in a independent world - your conditional distributions look exactly the same.</p>
<p>fishers test: small enough tables we can calculate the exact p values under the null hypothesis through a simple permutation argument. we can also use a permutation test in order to get some approximation to the exact test with a sufficiently large number of repetitions. sample a bunch of values for the table using random permutation and then look at the empirical p value. yates correction for large tables by subtracting 0.5 from all counts to make gaussian approximation more accurate. you can even change the null hypothesis from an test of independence to something else.</p>
<p>anova: analysis of categorical data {through factors which have different levels (which may have different outcomes)} with continuous outcomes. anova can be interpreted as a comparison of means (cef function) or a linear regression on categorical data. we want to check if a factor on some particular level has some systematic difference in the mean outcomes compared to other factors. our anova leans the following model: <script type="math/tex">y_i = \tau_{GLOBAL} + \tau_{x_i} + epsilon_i = \mu_{x_i} + \epsilon_i</script> . we have some global mean and then some offset for each group - which equals the group specific mean. and then the particular value for that specific data point is some further gaussian noise offset from this group mean. this is analogous to the case of linear regression. we can also interpret this as a one hot encoded design matrix in linear regression with a y_i output (and no intercept in the handout). if our model is a good fit, then the x_i explain the variance to a certain extent and our groups have unequal means. our null hypothesis through either interpretation is that the group specific means are equal. we put these through the F test <script type="math/tex">F = SS_{model}/SS_{error}</script> and get a p value from the resulting ratio of chi squares.</p>
<p>anova only tells us that there is some systematic difference: no indication of where the variation stems from. post hoc analysis via appropriate hypothesis tests will be required. will definitely need to watch out for false discovery. anova makes two main assumptions: identical variance in groups (heteroskedasticity of levels), normally distributed and independent data points.</p>
<p>alternative anovas:</p>
<p>two way anova: our model is <script type="math/tex">y_i = \mu + \tau_{x_i} + \tau_{z_i} + \gamma_{x_i, z_i} + \epsilon_i</script> . we need to make sure that our categorical counts are large. our design matrix will pretty much be the same but with an extension for the extra dummy variables. you will not be able to correctly attribute and disentangle crossover effects if you do not have sufficiently large data sets.</p>
<p>ancova; adding a continuous variable to our anova in order to control for a continuous variable appropriately. this is treatment controlled for continuous variable and the effect on outcome.</p>
<p>manova: multiple outputs - which may not be independent - anova. can be extended to ancova as well.</p>
<p>kruskal-wallis: nonparametric anova for difference in the medians of several groups. assumes a roughly similar shape for distributions. example: different directions of skew would mess this test up - roughly same shape of the distributions.</p>
<blockquote>
<p>other resources linked on the website</p>
</blockquote>
<p>the engineering statistics guide is a fantastic standard reference to get unstuck. 8 chapters covering most things that you will need to get started in trying to find the right direction for your experiments. understand the assumptions being made, understand what you are trying to search for and understand what you are currently doing. again - the most important thing is checking your assumptions. else you have a system with garbage in, garbage out.</p>
<p>i went through some papers but they weren’t very useful without the accompanying discussions. the ‘racial preferences in dating’ paper seemed fairly straightforward to understand. abandoned this section because i have literally no way to know if i am making progress.</p>
Sat, 23 May 2020 00:00:00 +0000
https://importdikshit.github.io/2020/05/tldr-mit-6s085.html
https://importdikshit.github.io/2020/05/tldr-mit-6s085.htmlwhat is positively-semi-definite<p>positively-semi-definite was created in an attempt to improve my technical exposition skills. Additionally, it will provide some impetus for me to preserve my notes in a more permanent form (as opposed to scribling them down on a legal notepad which I will inevitably discard). I will use this medium to share some interesting things that I am learning about and some reflections on concepts which (I believe) I already understand.</p>
<p>I chose the name positively-semi-definite for two reasons:</p>
<p><strong>A reduction to familiarity</strong>: A positively semi-definite matrix (over <script type="math/tex">\mathbf{R}</script>) has eigenvalues <script type="math/tex">\lambda_i \geq 0</script>. While studying an algorithm/system, proving that some key matrix is positively semidefinite generally gives an immediate piercing insight into the way things will ‘probably’ behave. This is also the fundamental goal of every technical note: cutting through the noise to reduce complicated thoughts into a set of familiar intuitions.</p>
<p><strong>An expression of uncertainty</strong>: To be semi-definite about something is to be uncertain. This works on two levels:</p>
<blockquote>
<p>Statistics is the science of uncertainty. Most of my notes will probably be about mathematical statistics. This is what I understand best, and is what I will be learning about for the forseeable future.</p>
</blockquote>
<blockquote>
<p><a href="http://www.stat.columbia.edu/~gelman/stuff_for_blog/ohagan.pdf">Epistemic Uncertainty</a>. These are my ‘known unknowns’ and my ‘unknown unknowns’. Although I will only write about topics that I am reasonably informed about, there is always scope to go one level deeper. I will never be able to speak from a place of absolute authority and there will always be some uncertainty inherent to publishing my opinions publicly: I will always be positively semi-definite about what I write. I hope that subjecting my technical opinions to a very public ‘peer review’ process will educate me about the perspectives which I missed.</p>
</blockquote>
Tue, 12 May 2020 00:00:00 +0000
https://importdikshit.github.io/2020/05/about-psd.html
https://importdikshit.github.io/2020/05/about-psd.htmlreevaluating my outlook on sleep<p>I <strong>used</strong> to sleep notoriously little. I was known as a low mean, high variance kind of sleeper.</p>
<p>This semester I took a class by Dr. Matthew Walker - The Psychology of Sleep - it gave me the impetus to make a very substantial change to my sleep habits. I am now a high mean, low variance sleeper. This reorientation in outlook towards sleep has left me feeling healthier, happier and much more productive.</p>
<p>I originally opted for Psych 133 because I wanted to be able to game the system with the things that I would learn. I thought that I would be able to leverage my newfound knowledge to become a biphasic sleeper who would utilize naps to get by with the bare minimum amount of sleep humanly possible. By the end of the second midterm I was trying to plan my day around my 7.5 hour minimum. This was a function of the research that we were shown and my reflections with regard to that research.</p>
<p>Over the course of my <strong>first complete</strong> blog post ( 🎉 ), I am going to share the bits and pieces of the course that impacted me the most. Correspondingly, I am going to share the definitive changes that I made to my lifestyle (I really did).</p>
<p><img src="https://d2mxuefqeaa7sj.cloudfront.net/s_DE685AEAD3F9C3EF9275041444228D05983C0A21050C41FBED8E1112C2839552_1543428718472_Screen+Shot+2018-11-28+at+10.06.35+AM.png" alt="" />
<img src="https://d2mxuefqeaa7sj.cloudfront.net/s_DE685AEAD3F9C3EF9275041444228D05983C0A21050C41FBED8E1112C2839552_1543428718483_Screen+Shot+2018-11-28+at+10.05.38+AM.png" alt="" /></p>
<p>A cursory search of my text messages (Spelling errors and subtle roasts included) will reveal my progress. This is in addition to the countless remarks that I have received in person.</p>
<p><strong>Note</strong>: I am going to abstract away many terms/concepts in order to maintain a low level of complexity within this post. As a result, this will only serve as a 10,000 foot overview of certain parts of the course (not a comprehensive review).</p>
<h2 id="health-and-immune-functioning">Health and Immune Functioning</h2>
<p>Did you ever notice how we tend to sleep a lot more than usual when we are ill? There is a fairly strong association between sleep habits and health/immune response.</p>
<p>When researchers immunized two groups of individuals after 4 days of observation, they noticed a discrepancy between the immune response of the two groups. The control group was allowed to sleep normally prior to the immunization. The second (deprivation) group was restricted to 4 hours of sleep per night for the 4 days prior to the immunization. The antibody response of the deprivation group was less than half as strong as the control group. This difference persisted even after a very substantial amount of recovery sleep. Therefore, the amount you sleep can very directly impact how susceptible you are to falling sick/rebounding from sickness.</p>
<p><img src="https://d2mxuefqeaa7sj.cloudfront.net/s_DE685AEAD3F9C3EF9275041444228D05983C0A21050C41FBED8E1112C2839552_1543433837314_Screen+Shot+2018-11-28+at+11.37.02+AM.png" alt="Source: Psych 133, Lecture 11, UC Berkeley Fall 2018" /></p>
<p>To further examine this phenomenon we were told about an experiment wherein researchers tracked the average amount that subjects slept over the course of 7 days. Subsequently, the subjects were administered nasal drops containing Rhinovirus and were monitored for symptoms of a cold over a period of 5 days. Independent of antibody levels, demographics or other such factors: subjects who slept less were more likely to be susceptible to illness.</p>
<p><img src="https://d2mxuefqeaa7sj.cloudfront.net/s_DE685AEAD3F9C3EF9275041444228D05983C0A21050C41FBED8E1112C2839552_1543434819953_Screen+Shot+2018-11-28+at+11.49.44+AM.png" alt="Source: Psych 133, Discussion 3, UC Berkeley Fall 2018" /></p>
<p>Over the course of the class, we were told about nuanced associations which illustrated a strong link between sleep deprivation and cancer. Apparently, a 4 hour reduction in sleep corresponds to a 70% reduction in Natural Killer Cells compared to normal. Such a decline is associated with a higher risk of developing cancer. Furthermore, the World Health Organization actually classifies shift work and chronic circadian-disrupters as “probable carcinogens”.</p>
<p>We were also shown studies which highlighted links between sleep deprivation and Type 2 diabetes (impairments in the glucose metabolism) and obesity (alterations in hormones which regulate hunger).</p>
<p>This was the first portion of the course that drew my attention towards how little I sleep. It was the first red flag that I needed to start sleeping more than my 4.5/6 hour stints. Unfortunately, “once-removed” and highly deferred effects are not great at catalyzing overwhelming change. It certainly made me more cognizant of my decisions to stay up late and hang out with friends/wrap up work, however, it seldom made me reconsider.</p>
<h2 id="memory-and-performance">Memory and Performance</h2>
<p>There is a very conspicuous difference in my clarity of thought and my ability to navigate steep learning curves depending on the amount of sleep that I am operating on. I was not aware of the scale of the effects and the long term ramifications of depriving oneself of the recommended amount of sleep ( <script type="math/tex">\geq</script> 8 hours).</p>
<p>A very interesting phenomenon that was illustrated in class was the discrepancy between the subjective and objective measures of attention and alertness. In turns out that humans are not great at subjectively assessing their sleepiness. Take a look at the figure below:</p>
<p><img src="https://d2mxuefqeaa7sj.cloudfront.net/s_DE685AEAD3F9C3EF9275041444228D05983C0A21050C41FBED8E1112C2839552_1543609302666_Screen+Shot+2018-11-29+at+11.35.45+PM.png" alt="Source: Psych 133, Lecture 13, UC Berkeley Fall 2018" /></p>
<p>If you look at SD 7 (the 7th night of sleep deprivation) you will notice that the groups which were constrained to 7 hours per night for 7 days and 9 hours per night for 7 days perceived the same level of sleepiness in a self assessment. When the same groups were objectively assessed via a Psychomotor Vigilance Task (PVT) there were some discrepancies in their levels of performance. Notice how the number of attention lapses measured by the test tends to diverge for two groups over time? It was really interesting to me how the 7 hour group tends towards the 5 hour group over the course of the seven days.</p>
<p><img src="https://d2mxuefqeaa7sj.cloudfront.net/s_DE685AEAD3F9C3EF9275041444228D05983C0A21050C41FBED8E1112C2839552_1543609312123_Screen+Shot+2018-11-29+at+11.36.39+PM.png" alt="Source: Psych 133, Lecture 13, UC Berkeley Fall 2018" /></p>
<p>This can also be seen in the figure below which illustrates the deterioration in attention through time (due to the lack of sleep). The manner in which the negative effects of partial sleep deprivation compounded through time was very alarming to me. Notice how the performance of the group which slept 6 hours per night was much closer to the one which slept 4 hours per night than it was to the group which slept 8 hours per night.</p>
<p><img src="https://d2mxuefqeaa7sj.cloudfront.net/s_DE685AEAD3F9C3EF9275041444228D05983C0A21050C41FBED8E1112C2839552_1543609292820_Screen+Shot+2018-11-29+at+11.35.02+PM.png" alt="Source: Psych 133, Lecture 13, UC Berkeley Fall 2018" /></p>
<p>Beyond the above, there was a plethora of evidence presented to us about how different sorts of memory are positively enhanced by varying combinations of sleep. Other compelling studies showed that skill based memories (motor and visual skills) are enhanced and consolidated by sleep. Also, we saw that the early stages of sleep help build immunity against forgetting and contribute to “stabilizing” fact based memories. Additionally, we were shown studies which established how sleep helps in the integration of knowledge, in extracting commonalities and building associations, and helps in building abstractions which provide intuition about problems.</p>
<p>To me this was a more immediate and impactful set of outcomes. I put this to the test by enforcing a 7.5 hour minimum and I saw results almost instantly. I was able to focus better, I felt that my understanding of subjects improved, and saw a slight uptick in my willingness to jump into complex problems.</p>
<h2 id="the-corrective-measures">The corrective measures</h2>
<p>A big part of maintaining this 7.5 hour minimum was integrating complementary lifestyle changes into my routine. I built a framework which would coerce me into sleeping 7.5 hours or more. The following are three habits that helped me adhere to my new sleep schedule:</p>
<ul>
<li>
<p><strong>Exercising closer to bed time</strong>: I started working out from 9 - 10 PM this semester. This makes me feel very tired by around 1 AM and I knock out almost instantly. When I climb into my bed (sans laptop) I am asleep within 3 or 4 minutes consistently. My quality of sleep has also increased as a result. Exercising also makes it unsustainable to sleep less than 6 hours because my body starts to shut down because of exhaustion.</p>
</li>
<li>
<p><strong>Napping</strong>! Sometimes sacrificing an hour or two of sleep cannot be avoided. When I am faced with such situations I found it easier to not fall into a tailspin and I try to maintain my routine by napping in the afternoon. Over the course of the class we learnt that this sort of behavior is not entirely unprecedented and that napping in the afternoon has many upsides for memory and learning.</p>
</li>
<li>
<p><strong>Stopped drinking Caffeine</strong>: This is another important thing that we learnt in the class - coffee gives us a very artificial sense of energy by masking the signal of certain hormones. It also decreases deep sleep. I realized that it is better (long term) to not consume coffee and plan my day in a more effective manner. This semester I have culled my coffee consumption in its entirety.</p>
</li>
</ul>
<p><img src="http://media.giphy.com/media/13dp24aR1KLitG/giphy.gif" alt="Cheers to not consuming any more coffee" /></p>
<p>Instituting this structure on my sleep schedule has given me impetus to initiate changes in other areas of my life which I had been neglecting: eating healthier and exercising more (as mentioned above). I have also started to develop a general preference for structure and organization over ad hoc plans (this is probably stems from my efforts to preserve my 7.5 of sleep every night). Overall it has been one of the more impactful lifestyle changes that I have adopted and for this reason I wanted to share the context surrounding my decision with others as well.</p>
<h2 id="what-next">What next?</h2>
<p>This report was a part of my final project for the class. One aspect of evaluation was outreach - sharing lessons from the class with others. The responses to this post + a draft of this post have been fairly consistent: where can I learn more about sleep? To that end I would recommend the book Why We Sleep by Dr. Matthew Walker. It is a great book and runs almost in parallel with the ideas that we discussed in Psychology 133 this semester. It is a great resource to raise your level of awareness about one of the most ubiquitous and fundmental biological needs of every human: <strong>sleep</strong>.</p>
Sat, 24 Nov 2018 00:00:00 +0000
https://importdikshit.github.io/2018/11/sleep.html
https://importdikshit.github.io/2018/11/sleep.htmlA Full and Comprehensive Style Test<p>Below is just about everything you’ll need to style in the theme. Check the <a href="https://raw.githubusercontent.com/mkchoi212/paper-jekyll-theme/master/_posts/2016-08-15-style-test.md">source code</a> to see the many embedded elements within paragraphs.</p>
<hr />
<h2 id="1-header">1. Header</h2>
<h1 id="header-1">Header 1</h1>
<h2 id="header-2">Header 2</h2>
<h3 id="header-3">Header 3</h3>
<h4 id="header-4">Header 4</h4>
<h5 id="header-5">Header 5</h5>
<h6 id="header-6">Header 6</h6>
<h3 id="1-1-header-alignment">1-1. Header Alignment</h3>
<h5 id="leftdefault">Left(Default)</h5>
<h5 class="center" id="center">Center</h5>
<h5 class="right" id="right">Right</h5>
<h2 id="2-body-text">2. Body Text</h2>
<p>Lorem ipsum dolor sit amet, <a href="https://www.google.com">test link</a> adipiscing elit. <strong>This is strong.</strong> Nullam dignissim convallis est. Quisque aliquam. <em>This is emphasized.</em> Donec faucibus. Nunc iaculis suscipit dui. 5<sup>3</sup> = 125. Water is H<sub>2</sub>O. Nam sit amet sem. Aliquam libero nisi, imperdiet at, tincidunt nec, gravida vehicula, nisl. <u>Underline</u>. Maecenas ornare tortor. Donec sed tellus eget <code class="language-plaintext highlighter-rouge">COPY filename</code> sapien fringilla nonummy. Mauris a ante. Suspendisse quam sem, consequat at, <del>Dinner’s at 5:00.</del> commodo vitae, feugiat in, nunc. Morbi imperdiet augue <mark>mark element</mark> quis tellus.</p>
<h2 id="3-images">3. Images</h2>
<p><img src="http://placehold.it/800x400" alt="Large example image" title="Large example image" /></p>
<p><img src="http://placehold.it/400x200" alt="Medium example image" title="Medium example image" />
<img src="http://placehold.it/200x200" alt="Small example image" title="Small example image" /></p>
<h3 id="3-1-image-alignment">3-1. Image Alignment</h3>
<p class="center"><img src="http://placehold.it/200x200" alt="Center example image" title="Center" /></p>
<h2 id="4-blockquotes">4. Blockquotes</h2>
<blockquote>
<p>Lorem ipsum dolor sit amet, test link adipiscing elit. Nullam dignissim convallis est. Quisque aliquam.</p>
</blockquote>
<h2 id="5-list-types">5. List Types</h2>
<h3 id="unordered-list">Unordered List</h3>
<ul>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Nam ultrices nunc in nisi pellentesque ultricies. Cras scelerisque ipsum in ante laoreet viverra. Pellentesque eget quam et augue molestie tincidunt ac ut ex. Sed quis velit vulputate, rutrum nisl sit amet, molestie neque. Vivamus sed augue at turpis suscipit fringilla.</li>
<li>Integer pretium nisl vitae justo aliquam, at varius nisi blandit.
<ol>
<li>Nunc vehicula nulla ac odio gravida vestibulum sed nec mauris.</li>
<li>Duis at diam eget arcu dapibus consequat.</li>
</ol>
</li>
<li>Etiam vel elit in purus iaculis pretium.</li>
</ul>
<h3 id="ordered-list">Ordered List</h3>
<ol>
<li>Quisque ullamcorper leo non ex pretium, in fermentum libero imperdiet.</li>
<li>Donec eu nulla euismod, rhoncus ipsum nec, faucibus elit.</li>
<li>Nam blandit purus gravida, accumsan sem in, lacinia orci.
<ul>
<li>Duis congue dui nec nisi posuere, at luctus velit semper.</li>
<li>Suspendisse in lorem id lacus elementum pretium nec vel nibh.</li>
</ul>
</li>
<li>Aliquam eget ipsum laoreet, maximus risus vitae, iaculis leo.</li>
</ol>
<h3 id="definition-lists">Definition Lists</h3>
<dl>
<dt>kramdown</dt>
<dd>A Markdown-superset converter</dd>
<dt>Maruku</dt>
<dd>Another Markdown-superset converter</dd>
</dl>
<h2 id="6-tables">6. Tables</h2>
<table>
<thead>
<tr>
<th style="text-align: left">Header1</th>
<th style="text-align: center">Header2</th>
<th style="text-align: right">Header3</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">cell1</td>
<td style="text-align: center">cell2</td>
<td style="text-align: right">cell3</td>
</tr>
<tr>
<td style="text-align: left">cell4</td>
<td style="text-align: center">cell5</td>
<td style="text-align: right">cell6</td>
</tr>
</tbody>
<tbody>
<tr>
<td style="text-align: left">cell1</td>
<td style="text-align: center">cell2</td>
<td style="text-align: right">cell3</td>
</tr>
<tr>
<td style="text-align: left">cell4</td>
<td style="text-align: center">cell5</td>
<td style="text-align: right">cell6</td>
</tr>
</tbody>
<tfoot>
<tr>
<td style="text-align: left">Foot1</td>
<td style="text-align: center">Foot2</td>
<td style="text-align: right">Foot3</td>
</tr>
</tfoot>
</table>
<h2 id="7-code-snippets">7. Code Snippets</h2>
<h3 id="highlighted-code-blocks">Highlighted Code Blocks</h3>
<div class="language-css highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">#container</span> <span class="p">{</span>
<span class="nl">float</span><span class="p">:</span> <span class="nb">left</span><span class="p">;</span>
<span class="nl">margin</span><span class="p">:</span> <span class="m">0</span> <span class="m">-240px</span> <span class="m">0</span> <span class="m">0</span><span class="p">;</span>
<span class="nl">width</span><span class="p">:</span> <span class="m">100%</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="standard-code-block">Standard code block</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><div id="awesome">
<p>This is great isn't it?</p>
</div>
</code></pre></div></div>
<h2 id="non-standard-code-block">Non Standard Code Block</h2>
<p>This is a quick github test</p>
Mon, 15 Aug 2016 00:00:00 +0000
https://importdikshit.github.io/2016/08/style-test.html
https://importdikshit.github.io/2016/08/style-test.htmltest,style