First of all, notations are often not standardized in scientific papers. So, notations will change from one field to another, they also change from one author to another within the same field, and even worse, they may change from one paper to another from the same author. The only times things are standardized is when people are building courses and textbooks so that readers are not confused. Students, in particular, tend to be very easily confused by a slight change of notation.

I have not seen any paper discussing the difference in notation and I do not think they will be any. Why? Because most of the people do not care much about such details. As said before, notations for variables are non standard and in the end convey little meaning. I personally tend to choose notations which are self-explanatory (as much as I can) using my own standards, which I am trying to keep constant yet I am updating it when I find a problem.

Most people are, however, very interested in the origin of the naming of objects and tools as there is often a nice story behind. For instance, why did Bellman named his theory "Dynamic Programming"? Or where does the term "Martingale" come from? So you will likely find some resources addressing those questions.

In any way, the main reason for the different languages and notations is not because "optimal control is Russian" (which is not completely true as it is also American) but because optimal control is considered in control theory (which emerged from control engineering, which, in turn, emerged from electrical engineering) and that reinforcement learning arose from computer science. The first one is (at least originally), essentially, continuous whereas the second one is, essentially, discrete.

In the early works from Pontryagin (Pontryagin's maximum principle) and Bellman (Dynamic Programming), both the state-space and the input-space are continuous (in fact the first works by Bellman were in discrete-time and not in continuous time, but that was fixed soon after). The cost is, therefore, also continuous. Why? Because, control theory was (and still is) addressing the control of continuous processes such as mechanical and electrical processes, all described by differential equations. This was later refined by considering many classes of systems such as discrete-time or hybrid systems and, perhaps more importantly, stochastic systems. This is still an active field of research, which I am currently working on. From my reading of the early papers from Lev Pontryagin and Richard Bellman, the state was always called x and the input called u. This is now standard all over the planet, at least in control. Sometimes, u is replaced by v, but this is close enough. I would say it is easy to speculate that this uniformity is simply due to education. People were taught like that and kept the notation. For the origin of u, I would tend to say that it might be because in the control of electrical motors, the control input is voltage, which is denoted by u or v (or even U or V) but I have no source for such an origin. I have checked in old control textbooks, and I could not find one where the input was not denoted by those letters. I found some old papers by Popov where the input is not denoted y those letters but the input is not really a control input there.

Reinforcement learning addressed problems in the discrete-time domain with discrete state and output spaces. Why? Because, in computer science everything is discrete. The probabilistic setting is also something that is typical from computer science and especially communication networks (i.e. queueing processes). Many of those constraints have been refined to make the theory broader but, in the end, reinforcement learning is a set of tools and methods to solve stochastic optimal control problems. In fact, the value function used in reinforcement learning is directly from Bellman's theory of Dynamic Programming, which serves as basis for solving optimal control problems. The main difference is that RL is applied to the control of systems which are not necessarily physical processes (e.g. playing video games), which were not interesting for control engineers. Also, it took quite some time to have computers to be able to run those algorithms in a realistic manner. Another thing is that RL addresses problems where you only have a very loose model for, that justifies the probabilistic point of view. While in control, we heavily rely on models (at least until very recently as there is now a lot of research done on model-free control and data-driven methods, which was actually fueled by the recent successes of ML/DL/RL).

In the end, different notations, different tools, different fields, but same ideas and goals. It is unfortunate that those fields do not have much more connections as it will save energy and time for everybody. There is no need to reinvent the wheel, but RL is a hot topic at the moment, optimal control is not.

In this talk, by Michael Jordan, he answers a question about Optimal Control vs. Reinforcement Learning. The question is at 45:30 and the answer sums up pretty much everything.