In supervised models a dependent variable y is predicted from a number of features x_1, x_2, ... ,x_n . Usually n is fixed and small. An example for n=2 would be a house price y, where x_1 is the size of the property and x_2 the number of rooms.<br />
<br />
However, what are classical model approaches in case n is not fixed?<br />
<br />
An example y could be the probability y for a customer to subscribe to a new service and the x_1 , ... x_n is a list of all the previous purchases, where the length n of the list can vary greatly between individuals.<br />
This is also a common situation in natural language modelling (NLP), where a text consists of n words of varying length.<br />
https://mathsgee.com/qna/21644/what-are-the-rules-of-d-separation-in-a-causal-dag
https://mathsgee.com/qna/21613/calculate-the-probability-that-neither-accident-is-severe-and-at-most-one-is-moderate
https://mathsgee.com/qna/21611/determine-the-probability-that-a-visit-to-a-pcps-office-results-in-both-lab-work-and-referral-to-a-specialist
https://mathsgee.com/qna/21605/given-that-a-policyholder-has-died-what-is-the-probability-that-the-policyholder-was-a-smoker
https://mathsgee.com/qna/21571/if-the-probability-of-death-from-a-risky-activity-is-1-if-you-carry-out-that-activity-200-times-what-is-the-probability-of-death?show=21572#a21572
<p>It is tempting to treat this like it is a Bernoulli experiment repeated 200 times resulting in a Binomial distribution as given below:</p>
<p>Answer:</p>
<p>$$1- (0.99)^{200} = 86.7\%$$</p>
<p>You are almost guaranteed of DEATH. Be careful about the risky things we do in life, including driving at high speed.</p>
<p>I disagree with this answer by Prof Arthur Mutambara on his <a rel="nofollow" href="https://www.linkedin.com/posts/arthurmutambara_if-the-probability-of-death-from-a-risky-activity-6733632618421616640-Lppy"><strong>LinkedIn.</strong></a> Here is my thinking.</p>
<p>1. The question requires the use of conditional problem given the way it is worded. Without loss of meaning, the same question can be written as follows:</p>
<p><strong>Given that it is a risky activity, the probability of death is $1\%$. If you carry out that activity 200 times, what is the probability of death?</strong></p>
<p>Using Bayes Theorem:</p>
<p>$$P(D|R) = \dfrac{P(R|D) \times P(D)}{P(R)}$$</p>
<p>Where</p>
<p>$P(D)$ is the probability of death each time (what we are trying to establish)</p>
<p>$P(R)$ is the risk rate (e.g. associated risk of being on the road)</p>
<p>$P(D|R)$ is the probability of death given that it was a risky activity</p>
<p>$P(R|D) is the risk given death</p>
<p>From the question $P(D|R) = 0.01$</p>
<p>From the <strong><a rel="nofollow" href="https://www.uneca.org/stories/un-rolls-out-zimbabwe-road-safety-performance-review-bid-reduce-carnage">UNECA website</a></strong> $P(R|D) =\dfrac{26.6}{100000}$ in Africa.</p>
<p>From intuition, $P(R)=0.99$ implying that the risk on the roads is high.</p>
<p>Substituting in Bayes Theorem:</p>
<p>$$0.01 = \dfrac{\dfrac{26.6}{100000} \times P(D)}{0.99}$$</p>
<p>Therefore $P(D) = 37.22\%$ everytime someone engages with the risky activity of travelling on some African road.</p>
<p>The 200 events are independent thus the only metric that matters is the probability of each individual outing.</p>
<p> </p>
<p> </p>
https://mathsgee.com/qna/21562/what-are-robustness-checks-when-doing-causal-inference?show=21563#a21563
<p>Robustness means clearly stating assumptions your methods and data rely on, and gradually relaxing each of them to see if your results still hold. It acts as an efficient coherence check if you realize your findings can dramatically change due to a single variable, especially if that variable is subject to noise, error measurement, etc.</p>
<p>Direct Acyclic Graphs (DAGs) are a great tool for checking robustness. They help you clearly spell out assumptions and hypotheses in the context of causal inference. </p>
<p><a rel="nofollow" href="http://www.dagitty.net/" target="_blank">Dagitty</a>, is a handy browser-based tool. In a nutshell, when you draw an assumed chain of causal events in Dagitty, it provides you with robustness checks on your data, like certain conditional correlations that should vanish. </p>Data Sciencehttps://mathsgee.com/qna/21562/what-are-robustness-checks-when-doing-causal-inference?show=21563#a21563Mon, 16 Nov 2020 05:20:43 +0000Answered: When is it appropriate to make a counterfactual causal estimation?
https://mathsgee.com/qna/21560/when-is-it-appropriate-to-make-a-counterfactual-causal-estimation?show=21561#a21561
<p>A classic example in tech is estimating the effect of a new feature that was released to all the user base at once: no A/B test was done and there’s absolutely no one that could be the control group. In this case, you can try making a counterfactual estimation.</p>
<p>The idea behind counterfactual estimation is to create a model that allows you to compute a <em>counterfactual</em> control group. In other words, you estimate what would happen had this feature not existed. It isn’t always simple to compute an estimate. However, if you have a model of your users that you’re confident about, then you have enough material to start doing counterfactual causal analyses!</p>
<p>Below is an e<em>xample of time series counterfactual vs. observed data by Shopify</em></p>
<p><img alt="" src="https://mathsgee.com/qna/?qa=blob&qa_blobid=9581918691447709996" style="height:380px; width:600px"></p>Data Sciencehttps://mathsgee.com/qna/21560/when-is-it-appropriate-to-make-a-counterfactual-causal-estimation?show=21561#a21561Mon, 16 Nov 2020 05:15:46 +0000Answered: How does one use difference in difference method for causal inference?
https://mathsgee.com/qna/21558/how-does-one-use-difference-in-difference-method-for-causal-inference?show=21559#a21559
<p><img alt="" src="https://mathsgee.com/qna/?qa=blob&qa_blobid=5850232941313224390" style="height:326px; width:600px"></p>
<p> </p>
<p>For this method to be applicable, you have to find a control group that shows a trend that’s parallel to your treatment group for the metric of interest, prior to any treatment being applied. Then, after treatment happens, you assume the break in the parallel trend is only due to the treatment itself. This is summed up in the above diagram.</p>Data Sciencehttps://mathsgee.com/qna/21558/how-does-one-use-difference-in-difference-method-for-causal-inference?show=21559#a21559Mon, 16 Nov 2020 05:10:04 +0000Answered: Can linear regression with fixed effects be used as a good estimator of causal effects?
https://mathsgee.com/qna/21556/can-linear-regression-with-fixed-effects-be-used-as-a-good-estimator-of-causal-effects?show=21557#a21557
<p>Sometimes it’s just not possible to set up an experiment. Here are a few reasons why A/B tests won’t work in every situation:</p>
<ul>
<li>Lack of tooling. For example, if your code can’t be modified in certain parts of the product.</li>
<li>Lack of time to implement the experiment.</li>
<li>Ethical concerns for example, at an e-commerce company like Shopify, randomly leaving some merchants out of a new feature that could help them with their business is sometimes not an option).</li>
<li>Just plain oversight (for example, a request to study the data from a launch that happened in the past).</li>
</ul>
<p>Setting up an A/B test for products is a lot of work. If you’re starting from scratch, you’ll need</p>
<ul>
<li>A way to randomly assign units to the right group as they use your product.</li>
<li>A tracking mechanism to collect the data for all relevant metrics.</li>
<li>To analyze these metrics and their associated statistics to compute effect sizes and validate the causal effects you suspect.</li>
</ul>Data Sciencehttps://mathsgee.com/qna/21552/what-does-one-need-in-order-to-start-a-b-testing?show=21553#a21553Mon, 16 Nov 2020 04:50:50 +0000Which case studies show how causal insights were used validate/invalidate entire business strategies?
