Interpreting data questions and answers

Recent questions in Reading and interpreting data
tinydancer27br 2022-05-22 Answered

Boy Born on a Tuesday - is it just a language trick?
The following probability question appeared in an earlier thread:
I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?
The claim was that it is not actually a mathematical problem and it is only a language problem.
If one wanted to restate this problem formally the obvious way would be like so:
Definition: Sex is defined as an element of the set boy,girl.
Definition: Birthday is defined as an element of the set Monday,Tuesday,Wednesday,Thursday,Friday,Saturday,Sunday
Definition: A Child is defined to be an ordered pair: (sex ×birthday).
Let (x,y) be a pair of children,
Define an auxiliary predicate H ( s , b ) : ! ! s = boy  and  b = Tuesday.
Calculate P ( x  is a boy and  y  is a boy | H ( x )  or  H ( y ) )I don't see any other sensible way to formalize this question.
To actually solve this problem now requires no thought (infact it is thinking which leads us to guess incorrect answers), we just compute
P ( x  is a boy and  y  is a boy | H ( x )  or  H ( y ) ) = P ( x  is a boy and  y  is a boy and  ( H ( x )  or  H ( y ) ) ) P ( H ( x )  or  H ( y ) ) = P ( ( x  is a boy and  y  is a boy and  H ( x ) )  or  ( x  is a boy and  y  is a boy and  H ( y ) ) ) P ( H ( x ) ) + P ( H ( y ) ) P ( H ( x ) ) P ( H ( y ) ) = P ( x  is a boy and  y  is a boy and  x  born on Tuesday ) + P ( x  is a boy and  y  is a boy and  y  born on Tuesday ) P ( x  is a boy and  y  is a boy and  x  born on Tuesday and  y  born on Tuesday ) P ( H ( x ) ) + P ( H ( y ) ) P ( H ( x ) ) P ( H ( y ) ) = 1 / 2 1 / 2 1 / 7 + 1 / 2 1 / 2 1 / 7 1 / 2 1 / 2 1 / 7 1 / 7 1 / 2 1 / 7 + 1 / 2 1 / 7 1 / 2 1 / 7 1 / 2 1 / 7 = 13 / 27
Now what I am wondering is, does this refute the claim that this puzzle is just a language problem or add to it? Was there a lot of room for misinterpreting the questions which I just missed?

Eve Dunn 2022-05-16 Answered

Relating factors of a polynomial over GF(q) and G F ( q m )
I am reading Richard Blahut's book "Algebraic Codes for Data Transmission" and have come to an impasse in my understanding in section 5.3. In this section, the author wants to relate the factors of a polynomial over GF(q) and the factors of that same polynomial over a GF-extension field G F ( q m ).
There is a good example in the book that I will use here to ask my question. Specifically the GF fields in this example use q=2 and m=4, so there is the prime field GF(2) and the extension field G F ( 2 4 ). The polynomial to be factored over both of these fields is one with particularly special properties, of the form x q m 1 1. So, the polynomial for this example is x 15 1.
The author first factors x 15 1 over GF(2). The prime factors are:
x 15 1 = ( x + 1 ) ( x 2 + x + 1 ) ( x 4 + x + 1 ) ( x 4 + x 3 + 1 ) ( x 4 + x 3 + x 2 + x + 1 )
This factorization over GF(2) makes sense to me and I can easily verify it in MATLAB.
The author then chooses two of the above prime factors and forms a new polynomial:
g ( x ) = ( x 4 + x 3 + 1 ) ( x 4 + x 3 + x 2 + x + 1 ) = x 8 + x 4 + x 2 + x + 1
The above multiplication over GF(2) also makes sense to me and I can easily verify it in MATLAB.
However, the author then changes to analyzing the same polynomial g ( x ) = x 8 + x 4 + x 2 + x + 1 in the extension field G F ( 2 4 ). It seems to be implied that because G F ( 2 4 ) in an extension field of GF(2), and because the polynomial in question is comprised of prime factors (over GF(2)) of the special composite polynomial x15−1, that this switch to the extension field G F ( 2 4 ) is justified.
To analyze g ( x ) in the extension field, we agree to represent the 16 elements of the extension field as the usual powers of the primitive field element alpha:
0 , 1 , α , α 2 , α 3 , α 4 , α 5 , α 6 , α 7 , α 8 , α 9 , α 10 , α 11 , α 12 , α 13 , α 14
The author claims that the aforementioned polynomial g ( x ) = x 8 + x 4 + x 2 + x + 1 has roots/zeroes over
g ( α 3 ) = 0
I am completely unable to make sense of this claim. It raises numerous questions in my mind, and the most basic of these questions is, how can I verify that g ( α ) = 0 and that g ( α 3 ) = 0?
To show how I've tried to verify that g ( α ) = 0 for g ( x ) = x 8 + x 4 + x 2 + x + 1, the author uses the following detailed representation of the elements of extension field G F ( 2 4 ). Of course to describe the elements of any GF extension field, we must choose an irreducible polynomial for the reducing modulus, and throughout the book the author uses the modulus α 4 + α + 1, yielding the 16 field elements:
0 = α 1 = α 0 α 1 = α α 2 = α 2 α 3 = α 3 α 4 = α + 1 α 5 = α 2 + α α 6 = α 3 + α 2 α 7 = α 3 + α + 1 α 8 = α 2 + 1 α 9 = α 3 + α α 10 = α 2 + α + 1 α 11 = α 3 + α 2 + α α 12 = α 3 + α 2 + α + 1 α 13 = α 3 + α 2 + 1 α 14 = α 3 + 1
Given the above element representation of G F ( 2 4 ), we return to the enigmatic claim that for g(x)=x8+x4+x2+x+1, there is the root g(α)=0 over GF(24). We have:
g ( x ) = x 8 + x 4 + x 2 + x + 1
So it seems that for g(x)=x8+x4+x2+x+1 polynomial, g ( α ) 0.
Can anyone show me how to demonstrate that α (as a GF(16) element) is a root of a polynomial that was computed over GF(2)?
If it would be helpful, I can scan the 3 pages of the book and you can see the author's own words for the above ideas. He throws this information at you very quickly, so it's fairly difficult for me (as a non-mathematician) to follow.

Noelle Wright 2022-05-16 Answered

How to learn Mathematics for Machine Learning?
Is the following path (with the following books) the correct way to learn Mathematics for Machine Learning? 0. Pre-algebra (Pre-Algebra Essentials For Dummies by Mark Zegarelli, Krista Fanning)
College Algebra (Schaum’s Outline of College Algebra by Robert E. Moyer, Murray R. Spiegel)
Pre-calculus (Schaum’s Outline of Precalculus by Fred Safier)
Calculus (Schaums Outline of Calculus by Frank Ayres, Elliott Mendelson)
Differential Equations (Schaum’s Outline of Differential Equations by Richard Bronson, Gabriel B. Costa)
Discrete Mathematical Structures (Schaums Outline of Discrete Mathematics by Seymour Lipschutz, Marc Lipson)
Linear Algebra (Introduction to Linear Algebra by Gilbert Strang)
Statistics and Probability (Practical Statistics for Data Scientists 50+ Essential Concepts Using R and Python by Peter Bruce, Andrew Bruce, Peter Gedeck)
Here's the lengthier version of my query. I was never bad with studies, however, I got bad grades and just passed my college because of some personal issues. I have been working since then in jobs which didn't require much mathematics.
I have switched careers now and I am working as a Machine Learning Engineer now. I did a course in Machine Learning but skipped the mathematics part. I want to study mathematics now, not just because it is required at work, but because I was always fascinated by mathematics. The issue is, I put in 60 hours a week in office. I take out time to study after work.
So I want to know if these books would actually help me understand the mathematics behind Machine Learning, or these will not cover it? Thank you for reading through this, any suggestion/help is highly appreciated.How to learn Mathematics for Machine Learning?

dumnealorjavgj 2022-05-15 Answered

I am reading the book "Elements of Information Theory" by Cover and Thomas and I am having trouble understanding conceptually the various ideas.
For example, I know that H(X) can be interpreted as the average encoding length. But what does H ( Y | X ) intuitively mean?
And what is mutual information? I read things like "It is the reduction in the uncertainty of one random variable due to the knowledge of the other". This doesn't mean anything to me as it doesn't help me explain in words why I ( X ; Y ) = H ( Y ) H ( Y | X ). Or explain the chain rule for mutual information.
I also encountered the Data processing inequality explained as something that can be used to show that no clever manipulation of the data can improve the inferences that can be made from the data. If X Y Z then I ( X ; Y ) I ( X ; Z ). If I had to explain this result to someone in words and explain why it should be intuitively true I would have absolutely no idea what to say. Even explaining how "data processing" is related to a markov chains and mutual information would baffle me.
I can imagine explaining a result in algebraic topology to someone since there is usually an intuitive geometric picture that can be drawn. But with information theory if I had to explain a result to someone at comparable level to a picture I would not be able to.
When I do problems its just abstract symbolic manipulations and trial and error. I am looking for an explanation (not these blah gives information about blah explanations) of the various terms that will make the solutions to problems appear in a meaningful way.
Right now I feel like someone trying to do algebraic topology purely symbolically without thinking about geometric pictures.
Is there a book that will help my curse?

fetsBedscurce4why1 2022-05-13 Answered

When we solve an equation, do we suppose that it is true and then work backward?
A couple of days ago I was reading Calculus by James Stewart and I read this:
Sometimes it is useful to imagine that your problem is solved and work backward, step by step, until you arrive at the given data. Then you might be able to reverse your steps and thereby construct a solution to the original problem. This procedure is commonly used in solving equations. For instance, in solving the equation 3x−5=7, we suppose that x is a number that satisfies 3x−5=7 and work backward. We add 5 to each side of the equation and then divide each side by 3 to get x=4. Since each of these steps can be reversed, we have solved the problem.
This sounded strange to me! I have always thought that when we solve an equation we don't suppose that the equation is already satisfied. I have always thought that when we solve an equation we use algbraic property of numbers to obtain a simpler equation that is equivalent to the starting equation. And since the equations are equivalent we don't need to suppose that the initial equation is true, because when the last one is true, is true also the first one.
In other words: if I have to solve 3x−5=7 I don't need to suppose that x is a number that satisys 3x−5=7, I simply ad 5 to bothe sides to obtain the equivalent equation 3x=12, then I divide both sides by 3 to obtain the equivalent equation x=4, when the last one is true, is true also the first one and vice versa, the last one is true when x is replaced by 4, so 4 is the solution.
And so this is my question: is it true that when we solve an equation we implicitly suppose that x satisfies the equation (and we need to do that) to apply algebraic properties that give us the equivalent and simplier equation? Do we need this logic assumption in solving equations?
Thanks.
EDIT
Let me try to explain in a better way why I don't understand the need to suppose that x satisfies the equation (i.e. that x makes true the equality). Excuse me for the lenght of this edit.
Let's say I want to solve in R the equation 3x−5=7.
I know that
∀a,b,c∈R,a+c=b+c↔a=b(P1)
and I know that
∀a,b,c∈R,(c≠0→(ac=bc↔a=b))(P2)
Property P1 says to me that:
if a+c=b+c is true, then a=b is true;
if a=b is true, then a+c=b+c is true;
if a+c=b+c is false, then a=b is false;
if a=b is false, then a+c=b+c is false;
and property P2 says to me that, if c≠0, then
if ac=bc is true, then a=b ia true;
if a=b is true, then ac=bc is true;
if ac=bc is false, then a=b is false;
if a=b is false, then ac=bc is false;
If I know all of this, then starting from 3x−5=7 I don't need to suppose that x makes true the equality, because:
I can say that 3x−5=7 is equivalent to 3x=12 because of P1 without supposing that 3x−5=7 is true, they have the same truth value for the same value of x;
from 3x=12, I can say that it is equivalent to x=4 because of P2 without supposing that 3x=12 is true, they have the same truth value for the same value of x;
now I can say that 3x−5=12 is equivalent to x=4 without supposing that 3x−5=12 is true, they have the same truth value for the same value of x;
in the end I have the solution, because when x=4 is true, also 3x−5=7, so 4 is the solution.
What am I doing wrong? Why do I need to suppose that x satisfy the equation?When we solve an equation, do we suppose that it is true and then work backward?

Alisa Durham 2022-05-12 Answered

I have been reading a paper: Wang X, Golbandi N, Bendersky M, Metzler D, Najork M. Position bias estimation for unbiased learning to rank in personal search.
I am wondering how they derive the M-step in an EM algorithm that has been applied to the problem in their paper.
This is the data log-likelihood function:
log P ( L ) = ( c , q , d , k ) L c log θ k γ q , d + ( 1 c ) log ( 1 θ k γ q , d ) .
where observed data are (c:click, q:query, d:document, k:result position) rows, and θ k is the probability a search result at rank k is examined by the user (assumed to be independent of query-document pair), and γ q , d is the probability a search result, i.e., a query-document pair is actually relevant (assumed to be independent of result position).
Here are some hidden variable probability expressions (E means examination and R means relevant, they assume that a click event implies a search result is both examined and relevant):
P ( E = 1 , R = 1 C = 1 , q , d , k ) = 1 P ( E = 1 , R = 0 C = 0 , q , d , k ) = θ k ( t ) ( 1 γ q , d ( t ) ) 1 θ k ( t ) γ q , d ( t ) P ( E = 0 , R = 1 C = 0 , q , d , k ) = ( 1 θ k ( t ) ) γ q , d ( t ) 1 θ k ( t ) γ q , d ( t ) P ( E = 0 , R = 0 C = 0 , q , d , k ) = ( 1 θ k ( t ) ) ( 1 γ q , d ( t ) ) 1 θ k ( t ) γ q , d ( t )
The paper has derived the M-step update formulas:
θ k ( t + 1 ) = c , q , d , k I k = k ( c + ( 1 c ) P ( E = 1 c , q , d , k ) ) c , q , d , k I k = k γ q , d ( t + 1 ) = c , q , d , k I q = q , d = d ( c + ( 1 c ) P ( R = 1 c , q , d , k ) ) c , q , d , k , I q = q , d = d
however, how to get these two formulas?
Note: M-step is usually derived from this formula:
Q ( θ θ ( t ) ) = E Z X , θ ( t ) [ log L ( θ ; X , Z ) ]
θ ( t + 1 ) = a r g m a x θ   Q ( θ θ ( t ) )
where θ are parameters (i.e., θ k and γ q , d in this case), X are data (c,q,d,k in this case), and Z are hidden variables (E and R?).