Let's imagine I have a function $f$(x) whose evaluation cost (monetary or in time) is not constant: for instance, evaluating f(x) costs $c(x)\in \mathbb{R}$ for a known function c. I am not interested in minimizing it in few iterations, but rather in minimizing it using "cheap" evaluations. (E.g. maybe a few cheap evaluations are more "informative" to locate the minima, rather than using closer-to-the-minima and costly evaluations.)

My questions are: Has this been studied or formalized? If so, can you point me to some references or the name under which this kind of optimization is known?

Just to clarify, I am not interested in either of these:

Standard optimization, since its "convergence rate" is linked to the total number of function evaluations.

Constrained optimization, since my constraint on minimizing $f$(x) is not defined in terms of the subspace allowed to x, but in terms of the added costs of all the evaluations during the minimization of $f$.

Goal optimization, since using few resources to minimize $f$ can not be expressed (I think) as a goal.

Edit

Surrogate-based optimization has been proposed as a solution. While it faces the fact that the function is expensive to compute, it does not face (straightforwardly) the fact that different inputs imply different evaluations costs. I absolutely think that surrogate-based optimization could be used to solve the problem. In this line, is there a study of how to select points xi to estimate the "optimization gain"? In terms of surrogate-based optimization: what surrrogate model and what kind of Design of Experiment (DoE) could let me estimate the expected improvement?

Also, some remarks to contextualize. Originally, I was thinking on tuning the hyperparameters of machine learning algorithms. E.g., training with lots of data and lots of epochs is costly, but different architectures may be ranked based on cheaper trainings and then fine-tuned. Therefore:

Let us assume that c(x) can be estimated at zero cost.

There is some structure in c(x): large regions have higher cost, large regions have cheaper cost, the cost is smooth, the minima may be in either region.

Let us assume that $f$(x) is smooth. Also, if possible, let us assume it has some stochastic error that we can model (such as Gaussian error with zero mean and known variance).

My questions are: Has this been studied or formalized? If so, can you point me to some references or the name under which this kind of optimization is known?

Just to clarify, I am not interested in either of these:

Standard optimization, since its "convergence rate" is linked to the total number of function evaluations.

Constrained optimization, since my constraint on minimizing $f$(x) is not defined in terms of the subspace allowed to x, but in terms of the added costs of all the evaluations during the minimization of $f$.

Goal optimization, since using few resources to minimize $f$ can not be expressed (I think) as a goal.

Edit

Surrogate-based optimization has been proposed as a solution. While it faces the fact that the function is expensive to compute, it does not face (straightforwardly) the fact that different inputs imply different evaluations costs. I absolutely think that surrogate-based optimization could be used to solve the problem. In this line, is there a study of how to select points xi to estimate the "optimization gain"? In terms of surrogate-based optimization: what surrrogate model and what kind of Design of Experiment (DoE) could let me estimate the expected improvement?

Also, some remarks to contextualize. Originally, I was thinking on tuning the hyperparameters of machine learning algorithms. E.g., training with lots of data and lots of epochs is costly, but different architectures may be ranked based on cheaper trainings and then fine-tuned. Therefore:

Let us assume that c(x) can be estimated at zero cost.

There is some structure in c(x): large regions have higher cost, large regions have cheaper cost, the cost is smooth, the minima may be in either region.

Let us assume that $f$(x) is smooth. Also, if possible, let us assume it has some stochastic error that we can model (such as Gaussian error with zero mean and known variance).