Imagine we are plotting some date points $\left(x_i,f(x_i)\right)$ that we obtained experimentally, and that we want to know what $f(x)$ is. The way to do this is to use some software and try to fit the data to some guessed function. For example, if the behavior of the data points looks like exponential decay we then choose an exponential decaying function ..etc.
My question is: sometimes the data points are perfectly fitted to the exponential decaying trial function only on a certain region, but then the rest of the points show deviation away from the trial function.
The theoretical decay model should be derived before you fit, and then the probability of the fit is given by the Baysian modification of your prior distribution for the parameters of the model. In this, I agree with the other answers.
But there are decay models which are sufficiently general that they can be used to fit large classes of experimental data, without fitting everything. If your data falls of exponentially for a while, then crosses over to falling like a power law with a slowly decreasing power, finally like a reciprocal logarithm, then like log-log, and then like log-log-log, you aren't going to get a good fit from any of these blind methods. But this is extraordinarily rare when studying physical systems--- such a complex decay only occurs in contrived mathematical situations, or when a system is actively moved from one phase to another according to a complicated plan.
The generic way in which you fit arbitrary data that you feel should be approximated by a smooth curve is to run a best-fit polynomial. The polynomials are dense in the continuous functions, so you can always approximate anything, but you must use a polynomial of lowest order which fits the structure you believe is there. Best-fit lines are most common.
The standard measure of quality of fit is the sum of the squares of the deviation from the fit. Minimizing this gives the least-squares line, and it is also easy to find least squares polynomials of arbitrary order, so long as the order is less than the number of data points.
But this type of fit is wrong for decaying data. In this case you have options:
There is no general absolute method, because each asymptotic limit is different, but once you know even just a little bit, you can extract the leading behavior and fit the rest using a complete expansion.