Let's imagine I have a function f(x) whose evaluation cost (monetary or in time) is not constant: for instance, evaluating f(x) costs c(x)∈R for a known function c. I am not interested in minimizing it in few iterations, but rather in minimizing it using "cheap" evaluations. (E.g. maybe a few cheap evaluations are more "informative" to locate the minima, rather than using closer-to-the-minima and costly evaluations.) My questions are: Has this been studied or formalized? If so, can you point me to some references or the name under which this kind of optimization is known? Just to clarify, I am not interested in either of these: Standard optimization, since its "convergence rate" is linked to the total number of function evaluations. Constrained optimization, since my constraint on minimiz

Malik Turner 2022-09-11 Answered
Let's imagine I have a function f(x) whose evaluation cost (monetary or in time) is not constant: for instance, evaluating f(x) costs c ( x ) R for a known function c. I am not interested in minimizing it in few iterations, but rather in minimizing it using "cheap" evaluations. (E.g. maybe a few cheap evaluations are more "informative" to locate the minima, rather than using closer-to-the-minima and costly evaluations.)
My questions are: Has this been studied or formalized? If so, can you point me to some references or the name under which this kind of optimization is known?
Just to clarify, I am not interested in either of these:
Standard optimization, since its "convergence rate" is linked to the total number of function evaluations.
Constrained optimization, since my constraint on minimizing f(x) is not defined in terms of the subspace allowed to x, but in terms of the added costs of all the evaluations during the minimization of f.
Goal optimization, since using few resources to minimize f can not be expressed (I think) as a goal.
Edit
Surrogate-based optimization has been proposed as a solution. While it faces the fact that the function is expensive to compute, it does not face (straightforwardly) the fact that different inputs imply different evaluations costs. I absolutely think that surrogate-based optimization could be used to solve the problem. In this line, is there a study of how to select points xi to estimate the "optimization gain"? In terms of surrogate-based optimization: what surrrogate model and what kind of Design of Experiment (DoE) could let me estimate the expected improvement?
Also, some remarks to contextualize. Originally, I was thinking on tuning the hyperparameters of machine learning algorithms. E.g., training with lots of data and lots of epochs is costly, but different architectures may be ranked based on cheaper trainings and then fine-tuned. Therefore:
Let us assume that c(x) can be estimated at zero cost.
There is some structure in c(x): large regions have higher cost, large regions have cheaper cost, the cost is smooth, the minima may be in either region.
Let us assume that f(x) is smooth. Also, if possible, let us assume it has some stochastic error that we can model (such as Gaussian error with zero mean and known variance).
You can still ask an expert for help

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

Solve your problem for the price of one coffee

  • Available 24/7
  • Math expert for every subject
  • Pay only if we can solve it
Ask Question

Answers (2)

Blaine Day
Answered 2022-09-12 Author has 14 answers
Yes, this has been studied before.
The replacement of a (computationally) heavy cost function with a cheaper, but less accurate version, is known as surrogate cost/fitness function. You might look for Surrogate-Based Modeling and Optimization techniques (in this reference are quite some available).
Not exactly what you’re looking for?
Ask My Question
Skye Vazquez
Answered 2022-09-13 Author has 4 answers
You could use a penalty function to add on the cost of evaluation onto the function itself. This would help an optimizer choose between a costly and cheap region when their objective values are roughly equal. Rather than increasing the penalty coefficient each iteration, as on Wikipedia, to force the optimizer to converge to low-penalty regions, we can instead decrease the penalty coefficient each iteration so that the optimizer may gradually explore further into high-penalty regions if it deems it necessary.
Formally, we minimize the objective function given by:
min x f ( x ) + σ k c ( x )
where σ k 0
Not exactly what you’re looking for?
Ask My Question

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

You might be interested in

asked 2021-01-31
The centers for Disease Control reported the percentage of people 18 years of age and older who smoke (CDC website, December 14, 2014). Suppose that a study designed to collect new data on smokers and Questions Navigation Menu preliminary estimate of the proportion who smoke of .26.
a) How large a sample should be taken to estimate the proportion of smokers in the population with a margin of error of .02?(to the nearest whole number) Use 95% confidence.
b) Assume that the study uses your sample size recommendation in part (a) and finds 520 smokers. What is the point estimate of the proportion of smokers in the population (to 4 decimals)?
c) What is the 95% confidence interval for the proportion of smokers in the population?(to 4 decimals)?
asked 2022-08-31
Mathematical Epidemiology Reference Request
I'm looking for a good Textbook for learning Mathematical Epidemiology. Something that I could read through and use as a future reference book.
asked 2021-02-09
The two response variables and whether they are qualitative or quantitative.
asked 2021-02-23
In the middle of the design, she decided to add the following line segment. How would she name this line segment?
___ or ___
asked 2022-09-11
I am studying control systems, and I want to solve following problem.
Given full rank state matrix A (with all unstable eigenvalues), design input matrix B, such that cost function J=trace(B′XB) is minimized, where X is the solution to discrete-time Ricatti equation (DARE). I have contraint that (A,B) is stabilizable, i.e.
For a given full rank A R n × n , with λ i ( A ) > 1, solve the following
minimize X R n × n , B R n × m t r ( B X B ) subject to X = A X ( I + B B X ) 1 A ( A , B )  is stabilizable
From my understanding, since all eigenvalues of A are outside of unit circle (discrete-time system), we can change condition (A,B) is stabilizable with (A,B) is controllable, which is equivalent to rank([ r a n k ( [ B A B A 2 B A n 1 B ] ) = n
The problem is for sure feasible, since for any full rank A, there is B such that rank condition is satisfied and we can solve DARE.
asked 2021-01-08
To provide: An explanation for the somewhat surprising result of the study.
asked 2022-09-12
What is the difference between ANOVA and ANCOVA?
In the context of using only experiment data for ANOVA analysis, ANCOVA offers post hoc statistical control. Is this a valid conclusion and why?

New questions