{"id":441,"date":"2020-10-21T11:07:07","date_gmt":"2020-10-21T09:07:07","guid":{"rendered":"http:\/\/benediktehinger.de\/blog\/science\/?p=441"},"modified":"2020-10-21T11:16:05","modified_gmt":"2020-10-21T09:16:05","slug":"why-robust-statistics","status":"publish","type":"post","link":"https:\/\/benediktehinger.de\/blog\/science\/why-robust-statistics\/","title":{"rendered":"Why Robust Statistics?"},"content":{"rendered":"\n<p>For my new EEG course in Stuttgart I spend some time to make this gif &#8211; I couldn&#8217;t find a version online. It shows a simple fact: If you calculate the mean, the breakdown point of 0%. That is, every datapoint counts whether it is an outlier or not.<br>Trimmed or winsorized means instead calculate the mean based on the inner X % (e.g. inner 80% for trimmed mean of 20%, removing the top and bottom 10% of datapoints) &#8211; or in case of winsorizing the mean with the 20% extreme values not removed, but changed to the now new remaining limits). Therefore they have breaking points of X% too &#8211; making them robust to outliers.<\/p>\n\n\n\n<p>Fun fact: a 100% trimmed mean is just the median!<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img decoding=\"async\" src=\"https:\/\/benediktehinger.de\/blog\/upload\/outlier_animation.gif\" alt=\"\"\/><\/figure><\/div>\n\n\n\n<p>As you can see, increasing the amount of outliers has a clear influence on <span class=\"has-inline-color has-luminous-vivid-orange-color\">the mean<\/span> but not <span class=\"has-inline-color has-vivid-green-cyan-color\">the 20% trimmed mean<\/span>.<\/p>\n\n\n\n<p>One important point: While sometimes outlier removal and robust statistics are very important, and arguable a better default (compared to mean) &#8211; you should also always try to understand where the outliers you remove actually come from.<\/p>\n\n\n\n<p>The source code to generate the animation is here:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">using Plots\nusing Random\nusing StatsBase\n\nanim = @animate  for i \u2208 [range(3,20,step=0.5)...,range(20,3,step=-0.5)...]\n    Random.seed!(1);\n\n    x = randn(50);\n    append!(x,randn(5) .+ i); #add the offset\n\n    histogram(x,bins=range(-3,20,step=0.25),ylims=(0,9.),legend=false)\n\n    vline!([mean(x)],linewidth=4.)\n    vline!([mean(trim(x,prop=0.2))],linewidth=4.)\n    #vline!([mean(winsor(x,prop=0.2))])# same as trimmean in this example\n\nend\ngif(anim, \"outlier_animation.gif\", fps = 4)<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>For my new EEG course in Stuttgart I spend some time to make this gif &#8211; I couldn&#8217;t find a version online. It shows a simple fact: If you calculate the mean, the breakdown point of 0%. That is, every datapoint counts whether it is an outlier or not.Trimmed or winsorized means instead calculate the mean based on the inner X % (e.g. inner 80% for trimmed mean of 20%, removing the top and bottom 10% of datapoints) &#8211; or in case of winsorizing the mean with the 20% extreme values not removed, but changed to the now new remaining&#8230;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[],"class_list":["post-441","post","type-post","status-publish","format-standard","hentry","category-blog"],"_links":{"self":[{"href":"https:\/\/benediktehinger.de\/blog\/science\/wp-json\/wp\/v2\/posts\/441","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/benediktehinger.de\/blog\/science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/benediktehinger.de\/blog\/science\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/benediktehinger.de\/blog\/science\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/benediktehinger.de\/blog\/science\/wp-json\/wp\/v2\/comments?post=441"}],"version-history":[{"count":0,"href":"https:\/\/benediktehinger.de\/blog\/science\/wp-json\/wp\/v2\/posts\/441\/revisions"}],"wp:attachment":[{"href":"https:\/\/benediktehinger.de\/blog\/science\/wp-json\/wp\/v2\/media?parent=441"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/benediktehinger.de\/blog\/science\/wp-json\/wp\/v2\/categories?post=441"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/benediktehinger.de\/blog\/science\/wp-json\/wp\/v2\/tags?post=441"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}