Roadmap for learning Topological Data Analysis? I'm a math major who has recently graduated and I w

aligass2004yi 2022-07-01 Answered
Roadmap for learning Topological Data Analysis?I'm a math major who has recently graduated and I will be starting full time work in 'data analysis'.Having finished with decent marks and still being incredibly interested in mathematics, I was thinking of pursuing graduate study/research at some point in the future. I was reading up about possible areas of study for this when I came across topological data analysis, which (as I understand it) is an application of algebraic topology to data analysis.Given my situation, I was intrigued by the concept and I would like to do some self study so I can have a working understanding of the subject. I have only done basic undergraduate abstract algebra, analysis and point set topology, and I am currently reading Munkres' Topology (Chapter 9 onwards). How do I get from where I am now to understanding the theory behind TDA and being able to apply it?My knowledge on further mathematics is far from extensive and I would appreciate any advice on links/texts which I could use to learn the relevant material.
You can still ask an expert for help

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

Solve your problem for the price of one coffee

  • Available 24/7
  • Math expert for every subject
  • Pay only if we can solve it
Ask Question

Answers (2)

Anika Stevenson
Answered 2022-07-02 Author has 19 answers
Before answering you question I would like to discuss some points:Topological data analysis is roughly, as you write, (algebraic) topology applied to the study of data. While you certainly will need to learn some topology, the type of topology that you should learn really depends on the type of applications you are interested in. For this reason I will not give you a roadmap, but a suggestion on how to draw your own roadmap.You should also not forget the second part in the definition of topological data analysis, namely that you are studying data. For this it would be good to learn some general facts about data analysis, and in particular statistics (more about this below). For a statistician’s viewpoint on topological data analysis, there is a nice series of columns by Robert Adler on what he calls TOPOS, available here.You have to know your data. This might go without saying, but too often I have seen people throwing some method at data to see what comes out of it, without even asking themselves why they are using that specific method. While depending on your job conditions you might be given more or less time to work on a specific project, I think that you should really try to make sure that you understand the data and the context as best as possible before even starting to think about which method you want to use. While topology gives a wealth of different methods that can be applied to the study of data, these might not always be the best tools to use, and there might be other techniques which are better suited. The bottom line is: there is no method or set of methods that fits all problems.And here comes my suggestion for how to draw your own roadmap:Topology. Robert Ghrist’s book Elementary Applied Topology gives a succinct overview of the main methods and ideas from topology that are used in applications. Every chapter covers a certain topic in topology and then gives examples of applications of these. While there are other texts on applied topology that delve into more detail from the mathematical point of view, I would suggest to use Ghrist’s book to get an idea of the applications and set of ideas, and then draw your own roadmap of topics that you would like to cover from there. Since the text is succint, you might need to use also other texts to learn more about the mathematics covered in each chapter. For example, to learn more about (smooth) manifolds (Chapter 1) you might want to read up some more things in Lee’s Introduction to smooth manifolds, or to learn more about Cohomology (Chapter 6) you might want to consult Hatcher’s Algebraic Topology. Again, I don’t think that there is a ''one size fits all'' answer to which texts you should use for this, but once you have a good grasp of what exactly you would like to understand better, you could again ask people with more experience for advice.Statistics. A book that analogously to Ghrist’s book could help you in designing your own roadmap is Larry Wasserman’s All of Statistics. Also, note that the application of statistical methods to techniques from topological data analysis is an active area of research, and while there are some tools and libraries that can be used for applications, this area is still in its infancy. I list here the libraries and relevant references for statistical tools for topological data analysis that I know off the top of my head (these are all related to persistent homology):Persistence Landscapes and the corresponding toolboxThe TDA package tutorial and the packagePersistence images and libraryData science. Finally, as for data science more broadly, I don’t know any good text, but you might get an idea of some of the general themes from the book Mathematical Problems in Data Science.Aside: to finish off, I give some additional references to books/papers and software packages.References for topological data analysis, and computational topology:Topology and data, CarlssonComputational Topology, Edelsbrunner and HarerTopology for Computing, ZomorodianPersistence Theory, Oudot (this might be too specific, but this would be useful if you want to learn more about the theory behind persistent homology)Computational homology, Kaczynski, Mischaikow, MrozekOpen source libraries that implement some of the methods from topological data analysis:Mapper: Python MapperPersistent homology: a few of the most recent (and best performing) libraries are Ripser , GUDHI, and DIPHA. Note that there is also an overview of the different libraries for persistent homology available here. (Disclaimer: I am one of the authors of this paper. Also, the version on the ArXiv is outdated, and will be replaced by an up-to-date version in the next weeks, so it might be better to look at this once it is updated.)
Did you like this example?
Subscribe for all access
taghdh9
Answered 2022-07-03 Author has 6 answers
Geometric and Topological Inference is an excellent book for introducing persistent homology. If you didn't do algebraic topology course it should be easier than Edelsbrunner and Harer's book. I also found it more approachable since it has more exercises, and gives more details on construction of complexes.
Did you like this example?
Subscribe for all access

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

You might be interested in

asked 2021-02-23
Interpreting z-scores: Complete the following statements using your knowledge about z-scores.
a. If the data is weight, the z-score for someone who is overweight would be
-positive
-negative
-zero
b. If the data is IQ test scores, an individual with a negative z-score would have a
-high IQ
-low IQ
-average IQ
c. If the data is time spent watching TV, an individual with a z-score of zero would
-watch very little TV
-watch a lot of TV
-watch the average amount of TV
d. If the data is annual salary in the U.S and the population is all legally employed people in the U.S., the z-scores of people who make minimum wage would be
-positive
-negative
-zero
asked 2020-12-21
Interpreting Power For the sample data in Example 1 “Adult Sleep” from this section, Minitab and StatCrunch show that the hypothesis test has power of 0.4943 of supporting the claim that μ<7 hours of sleep when the actual population mean is 6.0 hours of sleep. Interpret this value of the power, then identify the value of beta and interpret that value. (For the t test in this section, a “noncentrality parameter” makes calculations of power much more complicated than the process described in Section 8-1, so software is recommended for power calculations.)
asked 2022-07-08
Is there a mathematical basis for the idea that this interpretation of confidence intervals is incorrect, or is it just frequentist philosophy?
Suppose the mean time it takes all workers in a particular city to get to work is estimated as 21. A 95% confident interval is calculated to be (18.3,23.7).According to this website, the following statement is incorrect:
There is a 95% chance that the mean time it takes all workers in this city to get to work is between 18.3 and 23.7 minutes.
Indeed, a lot websites echo a similar sentiment. This one, for example, says:
It is not quite correct to ask about the probability that the interval contains the population mean. It either does or it doesn't.
The meta-concept at work seems to be the idea that population parameters cannot be random, only the data we obtain about them can be random (related). This doesn't sit right with me, because I tend to think of probability as being fundamentally about our certainty that the world is a certain way. Also, if I understand correctly, there's really no mathematical basis for the notion that probabilities only apply to data and not parameters; in particular, this seems to be a manifestation of the frequentist/bayesianism debate.
Question. If the above comments are correct, then it would seem that the kinds of statements made on the aforementioned websites shouldn't be taken too seriously. To make a stronger claim, I'm under the impression that if an exam grader were to mark a student down for the aforementioned "incorrect" interpretation of confidence intervals, my impression is that this would be inappropriate (this hasn't happened to me; it's a hypothetical).
In any event, based on the underlying mathematics, are these fair comments I'm making, or is there something I'm missing?
asked 2022-07-07
I recently saw a proof that the real number field is not interpretable in the complex number field. But this required the axiom of choice, namely the existence of wild automorphisms of the complex numbers. Is there a way to prove it in ZF alone?
asked 2022-07-01
How to show the interpretability of NMF by a small qualitative example on a toy data?
In some paper, such as, Nonnegative Matrix Factorization: A Comprehensive Review, I see the interpretability of Nonnegative matrix factorization (NMF). However, I don't know the means of this. How to show the interpretability of NMF by a small qualitative example on a toy data?
In addition, what is interpretability? Especially in Non-negative Matrix Factorization, how to understand interpretability?
asked 2022-05-29
Sheaf of rings on a discrete set.
I was reading through some notes for an exam and one exericse asks me to prove the following
There is a unique sheaf of rings making a topological set X with discrete topology a ringed space.
I tried doing it but I feel I'm missing something, using the definition of presheaf and than of sheaf doesn't seem to bring me any result. How can I solve such a problem? I leave you my definitions of presheaf and sheaf.
A presheaf F (of rings) on a topological space X consists of the data:
for every open set U X a ring F ( U ) (think of this as the ring of functions on U),
for every inclusion U V of open sets in X a ring homomorphism ρ V , U : F ( V ) F ( U ) called the restriction map (think of this as the usual restriction of functions to a subset), such that
F ( ) = 0
ρ U , U is the identity map of F ( U ) for all U,
for any inclusion U V W of open sets in X we have ρ V , U ρ W , V = ρ W , U
The elements of F ( U ) are usually called the sections of F over U, and the restriction maps ρV,U are written as φ φ | U
A presheaf F is called a sheaf of rings if it satisfies the following gluing property:
if X is an open set, { U i : i I } an arbitrary open cover of U and φ i F ( U i ) sections for all i such that φ i | U i U j = φ j | U i U j for all i, i , j I, then there is a unique φ F ( U ) such that φ | U i = φ i for all i.
EDIT: This is what is given as the definition of a K-ringed space:
A ringed spaces equipped with a sheaf of rings such that the elements of O X ( U ) are actual functions from U to a fixed ring K;
EDIT: It turns out that the actual definition is
A ringed spaces equipped with a sheaf of rings such that the elements of O X ( U ) are actual functions from U to a fixed ring K and O X ( U ) is not only a subring of the ring of functions from U→K but a sub−K−algebra of it;
What does this change?
asked 2022-07-01
I am studying bioinformatics. I am trying to solve a problem. So we have a gene, whose initial value x 0 at time t=0 is x 0 =1. A perturbation of factor −0.9789812 is applied to it, such that, at time t=10, its value is 0.0210359. The gene is measured at time-points t=[0,1,2...,10]. How can I know the calculate the values at time t=1,2,...,10, given the only information I have is the value at x 0 and and the perturbation applied ? I am giving the data here
x = [1.0000000,0.3482754,0.1304151,0.0575881,0.0332433,0.0251052,0.0223848,0.0214755,0.0211715,0.0210698,0.0210359]
But I want to know how would I generate this data. I tried the exponential decay function and the values do not correspond to the ones I have in my data set.
So if the gene has an initial value of 1 and a perturbation of −0.9789812 is applied to it, at the final reading of the gene at time t = 10 is 1−0.9789812 = 0.0210188. The decay is not linear, it seems like it has an exponential curve but I dont know how to fit it.
Note that the only information I have is the initial value of the gene, the perturbation applied. I want to be able to calculate the values of gene at any time given this information.

New questions