# Fitting curves to your data using least squares

## Introduction

If you're an engineer (like I used to be in a previous life), you have probably done your bit of experimenting. Usually, you then need a way to fit your measurement results with a curve. If you're a proper engineer, you also have some idea what type of equation should theoretically fit your data.

Perhaps you did some measurements with results like this:

Fitting data with an equation.

A well known way to fit data to an equation is by using the least squares method (LS). I won't repeat the theory behind the method here, just read up on the matter by clicking that link to Wikipedia.

## Fitting simple linear equations

Excel provides us with a couple of tools to perform Least Squares calculations,
but they are all centered around the simpler functions: simple Linear functions
of the shape

y=a.x+b, y-a.exp(b.x), y=a.x^b and etcetera. With some tricks you can also
perform LS on polynomes using Excel.

### Regression tools in the Analysis Toolpak Add-in

Activate the Analysis Toolpak in your list of Add-ins (File button or Office button, Excel Options, Add-ins tab, click Go):

The add-ins list of Excel with the Analysis toolpak activated

This adds the "Data Analysis" button to your ribbon, on the Data tab, Analysis group (this is also the location where you can find the Solver button mentioned later on):

Ribbon with Data Analysis button

Click that button to explore which regression tools are available.

### Worksheet functions

There is a number of worksheet functions which you can also use to do regression analysis. To quickly access them, select an empty cell and click shift+F3 to open the function wizard. In the search box, enter "Regression" (without the quotes of course). Excel will list the relevant functions:

Function wizard showing Regression functions

Pick one and click on the "Help on this function" link at the bottom of the function wizard to find out more about its use.

## Fitting more complex functions

What if you want to fit a more complex function, like y=exp(a.x).sin(x) + b ? How can that be done using Excel?

I devised a way to do this which involves the following steps:

- Create a table with x and y values
- Add a column with the model function formula, which points to your x-es and to some cells for the constant(s)
- Have a column that calculates the Sum of Squares
- Use Solver to find the constants which yield the lowest Sum of Squares.

## Explanation of the Example file

I created an example file you can put to use directly. Below you will find a link to the file and an explanation on how the file is put together.

### Download

Download this file:

Non linear least squares example

### How the file works

#### Data

The calculations and the data are concentrated on Sheet1 of the file. The most important area is the table starting in cell A1:

Data table in LS file

Column A holds your x-values and column B holds the y-values. The third column holds the formula that calculates the result of the fitted equation using the constants and the x-values. The sample file has this formula in column C:

=EXP(Const_a*xValues)*SIN(xValues)+Const_b

The fourth column of the table is used to calculate the sum of squares. Formula:

=(B2-C2)^2

As you probably noted already, I used a couple of range names. I explain those below.

#### Range names

To ease working with the file I created some range names. Instead of using the table references that Excel 2007, 2010 and 2013 offer, I included some dynamic range names that point to the data. This means the workbook also works in Excel 2003 and before.

Range name | Refers To | Description |

Const_a | =Sheet1!$G$2 | Model constant |

Const_b | =Sheet1!$G$3 | Model constant |

Const_c | =Sheet1!$G$4 | Model constant |

Const_d | =Sheet1!$G$5 | Model constant |

Const_e | =Sheet1!$G$6 | Model constant |

Const_f | =Sheet1!$G$7 | Model constant |

Const_g | =Sheet1!$G$8 | Model constant |

Const_h | =Sheet1!$G$9 | Model constant |

Constants | =Sheet1!$G$2:$G$9 | constants of equation |

xValues | =OFFSET(Sheet1!$A$2,0,0,COUNT(Sheet1!$A$1:$A$65551),1) | Column with x values |

yDelta | =OFFSET(xValues,0,3) | Column with Squared differences |

yhat | =OFFSET(xValues,0,2) | Column with model fit results |

yValues | =OFFSET(xValues,0,1) | Column with y values |

#### Constants of the equation

The const range names point to a second table in the file:

Constants table

This table is where you enter your first initial guesses for the resulting constants and where the Solver add-in also returns the results. As you can see, below that table the residual Sum of Squares is shown. Formula:

=SUM(yDelta)

It is this cell G11 that we try to minimize using the Solver add-in.

#### Using Solver

First of all, you need to install the Solver add-in. Use the Add-ins dialog I showed at the top of this article and check the box next to "Solver Add-in". This adds the Solver button in the same location on the ribbon as the "Data Analysis" button I showed before.

After you have ensured the model formula is correctly entered in column C and the calculations work, click the Solver button. The dialog below is shown:

The Solver dialog

Make sure the "Set Objective" box points to the cell that contains the sum of squares. Select "Min" next to "To".

The "By Changing Variable cells" box must ONLY point to the cells that are used by your model, otherwise the degrees of freedom calculation (on the ANOVA sheet) will be wrong. Also ensure that any unused constant cells are empty by selecting them and hitting the del key.

Note that depending on your model type you may have to change
the solver settings. A bit of experimenting may be needed for best results.
You can save and load Solver settings using the appropriate button.

So be prudent and critical on whether or not you have actually reached a
best fit, the Solver may come up with non-optimal results, depending on
your model equation and solver settings.

If you're happy with the current Solver settings, click Solve. After some time the "Solver Results" dialog opens, giving you some options on how to continue. Note that it also enables you to ask for a couple of reports.

The example file shows the end result:

The end result

#### Analysis of Variance

On the ANOVA tab, you can find the ANalysis Of VAriance table, which looks like this:

The ANOVA table

The most important cell here is cell F2. If the value in that cell is less than 0.05, there is a 95% probability your model is correctly fitting the data. So less is more for this cell, you want it to stay below 0.05. The cell will turn red for values over 0.05.

Please check whether the value in cell B2 is exactly one less than the number of constants you used for the model. If not, go back to Sheet1 and empty the cells not used by your model. So if you used const_a and const_b, then the value of B2 (model degrees of freedom) should be 1.

## Conclusion

As you've seen fitting complex functions to your data isn't very hard to do. A combination of some relatively simple formulas and the Solver Add-in comes to the rescue here.

Some advice as one engineer to another; Be critical please. Don't believe everything Excel tells you! Carefully analyse the results it returns, as Solver may get things wrong and not give you the best possible result!

## Comments

All comments about this page:

Comment by: Hui... (4/14/2012 7:33:11 PM)

Thats an excellent technique!

Thanx for sharing with us.

Hui...

Comment by: Mohammad (9/13/2012 3:59:56 AM)

Comment by: Jan Karel Pieterse (9/13/2012 9:39:02 AM)

Thank you!

Your being able to help your daughter with her homework is exactly the reason why I write these articles: to help others.

Comment by: Alex (11/12/2012 7:08:24 PM)

Comment by: Martin (11/25/2012 6:55:53 PM)

Here is my equation:

f(x) = Const_f*exp(a*x) *sin(Const_g*x*(Const_e*exp(Const_d*x)+c) + b

Comment by: X.li (2/15/2013 12:59:41 PM)

Comment by: xyz (3/1/2013 5:36:26 AM)

helped me in my lab report

Comment by: luiz (3/18/2013 11:40:54 PM)

I used it to equation y=A+B*X+C*X^n and it worked very well. Is it possible adapt it to estimate prediction bounds with 95% confidence?

I used to do it using originlab.

Thanks

Comment by: Jan Karel Pieterse (3/19/2013 8:17:48 AM)

I expect it is possible, but I do not have the needed theoretical information to build the appropriate formulas.

Comment by: Thuto (4/8/2013 10:09:25 PM)

This seems to be very informative.

But, how do you link the two tables, Table3 and Table2 so that Table3 is able to read constants in Table2?

Comment by: Jan Karel Pieterse (4/9/2013 11:41:03 AM)

I am not sure I understand your question, can you please try to explain what you mean?

Comment by: Thuto (4/9/2013 1:45:32 PM)

Since you have two data tables. One with x, y,yhat,(y-yhat)^2, and the other with constants.How do you allow the data table to read const_a and const_b in the constant table in order to solve f(x)=exp(const_a*xValues)*sin(xValues)+const_b in the 3rd column of the data table?

Comment by: Jan Karel Pieterse (4/9/2013 2:32:54 PM)

Ah, I get it.

The method I use is as follows.

- You paste your data in the first two columns of the first table

- The subsequent columns contain formulas. The formulas in column 3 are most important, you need to write the model formula in this column yourself. The constants are defined as range names: Const_a, Const_b, Const_c, ... The xvalues are defined by the range name xValues.

So if your model is:

f(x) = exp(a*x) *sin(x) + b

then your formula in column C becomes:

=EXP(Const_a*xValues)*SIN(xValues)+Const_b

Comment by: J (4/10/2013 6:53:05 PM)

Would it be possible to describe more on the ANOVA portion? Which were be your variables and how did you get those values eventually?

Comment by: Jan Karel Pieterse (4/12/2013 12:07:34 PM)

Perhaps this link give you some pointers:

http://www.stat.yale.edu/Courses/1997-98/101/anovareg.htm

and this one:

http://chemistry.oregonstate.edu/courses/ch361-464/ch464/RegrssnFnl.pdf

Comment by: Julio Gomes (4/23/2013 2:35:49 AM)

I'm in fact a chemist and reading your explanation became to me two questions:

Which algorithm does SOLVER use? I see a calculation that I nedd to reproduce but it uses the Levenberg-Marquardt algorithm... Does it get such considerable diferences?

In your plan sheet furnished to us, can i use the model and just substitute the values for my system...

Gratefuly

Comment by: Jan Karel Pieterse (4/23/2013 9:55:24 AM)

The solving methods available through Solver are listed in the dropdown, there are no other methods than listed there.

Which is the best one will depend on your model and data I'm afraid, it is hard to predict which is best for your situation.

I suggest you just try.

Comment by: Andrew (6/25/2013 11:22:54 AM)

I have measurement errors on my y-values. How could I incorporate these into the least squares fit? Any help would be very much appreciated!

Comment by: Jan Karel Pieterse (6/25/2013 5:29:03 PM)

To be honest, that is beyond my knowledge!

Comment by: Carlos Cevallos (11/2/2013 8:25:14 PM)

Thanks you

Comment by: Jan Karel Pieterse (11/2/2013 8:51:22 PM)

The algorithm solver uses is one of the things you can choose. (Solving method)

Comment by: Alex (11/25/2013 6:37:49 AM)

http://vizsage.com/other/leastsquaresexcel/LeastSquares.xls

If you have measurement errors on y-values, use LSFit function

if errors on both X and Y, use LSFitBoth function.

The VBA source code is provided in the spreadsheet

Comment by: GOPI YELURI (11/25/2013 8:36:27 AM)

Comment by: Hassam (12/11/2013 10:06:41 AM)

I followed your method. I have a similar problem I do everything correct and my initial guesses are 1 and 1. When i use solver values remain 1 solver does not seem to work. What should i do?

Regards

Comment by: Jan Karel Pieterse (12/11/2013 10:33:44 AM)

Hard to say.

Sometimes the formula simply isn't right for the type of data you have.

Have you tried manually inputting some guesses to see if you can get closer to the proper value?

Comment by: Srinivas Bikkina (2/16/2014 11:34:08 AM)

I have to use a power fit to explain the relation between X and Y (equation: Y = const-a* x^-constb).

when I put my data in excel using power fit, it is very well fitted with R2 = 0.97 or 0.98. However, I would like to check the residual on the fit and significance of the fit using your macro enabled excel sheet. After my initial guess values using solver I could always get SS (the cell to be minimized) is coming of the order of ~ 2E-9 or E-10. I assume that this is very very low. However, when I refer to ANOVA table, where I found that P-value cell is showing a number error and F-value is negative. Kindly help me in this regard.

Comment by: Jan Karel Pieterse (2/17/2014 6:19:08 AM)

Without knowing your data this is very hard to answer!

Please send your workbook to the email address below.

Comment by: Cui (2/25/2014 8:32:33 AM)

Comment by: Jan Karel Pieterse (2/25/2014 9:54:40 AM)

If you have access to a proper statistics software package then it probably already has such an option built-in.

Getting this done in Excel is certainly possible, but I don't really know the math behind it :-)

Comment by: Wezen (6/9/2014 2:40:47 PM)

I have am facing some problems when using the solver; it always says that there is an error in the model. I have tried replacing my data with other sample data in the hopes that it would work but nothing has changed.

Comment by: Jan Karel Pieterse (6/10/2014 8:40:59 AM)

This is a bit hard to say without model or data :-)

Comment by: Wezen (6/10/2014 9:10:56 PM)

it seems that the solver isnt working for me even when I try and use all the data and the functions that are in the template file for the "non-linear least squares example". As soon as i change the constants and then try to solve to get the original answer back, it says that there is an error in the model; do i need to change anything in my excel setting for this file to work?

Comment by: Jan Karel Pieterse (6/11/2014 8:36:37 AM)

That is probably what needs to be done, yes. No idea in which way though, that requires some experimenting.

Comment by: Hammad Khan (7/14/2014 9:52:36 AM)

file how did you have performed ANOVA ? i followed your procedure

with your given data, and i faced no problem, but for anova, can u

guide also? that which colum should be selected and which tab? as

in excel data>data analysis gives 3 options for anova. which

should be selected.?

Comment by: Jan Karel Pieterse (7/14/2014 9:53:46 AM)

Unfortunately I have no time to repsond this week. I advise you to ask your question at www.eileenslounge.com.

Comment by: Sofya (12/4/2014 9:03:34 AM)

Actually, the method you describe in "Fitting more complex functions" is a well-known one, I studied it during my master's degree. The explanation is good, though, and it's nice that you want to share your finding, just I think that you shouldn't claim the authorship.

Comment by: Jan Karel Pieterse (12/4/2014 10:49:56 AM)

Well, I AM the author of this article and I did devise the spreadsheet on my own. That makes me the author, does it not? I never claimed to be the first person to write something about this and I was not expecting to either.

Comment by: Lasse Petersen (12/7/2014 8:45:27 PM)

Im making this projekt in to school in Denmark, about the SIR model. I dont know if you know it, but in the Model there are some constants, that i need to find, and the way i have tried to do this is by using this method on the squares, between the data that i have from the recent Ebola outbreak in liberia, and the teoretical data i get by using some formulas. the problem is that it seems like excel only changes the constants very little, and it gives me a new number every time i use it. do you know what i do wrong, and can i maybe send it to you so you can help me? its a very important projekt, and it needs to be done in 10 days, so any help is appreciated.

-Lasse Petersen

Comment by: Jan Karel Pieterse (12/8/2014 8:39:17 AM)

Send a copy of the file by email, I will have a look.

Comment by: Simon Eaton (1/7/2015 8:35:53 PM)

-Simon

Comment by: Jan Karel Pieterse (1/8/2015 1:16:41 PM)

Can you perhaps email that file to me (see address at bottom of page)?

Comment by: Francisco Puerta (1/9/2015 10:55:07 PM)

Comment by: CHAYEH (1/22/2015 3:15:50 PM)

Comment by: Kevin Doggett (3/10/2015 1:34:48 PM)

Comment by: Jan Karel Pieterse (3/10/2015 1:52:05 PM)

I'm not entirely sure, my first reponse was that r-squared only applies to linear fits. But I'm not even sure on that :-)

Comment by: DC (3/26/2015 1:23:58 PM)

I've just launched a free online tool to do this type of curve-fitting in a much simpler way. It's free to use, does not require any download and exports results to Excel. See http://www.mycurvefit.com.

I welcome any feedback you can provide.

Comment by: DC (3/26/2015 1:44:46 PM)

http://www.mycurvefit.com/index.html?action=openshare&id=08cfa3e5-37a3-4187-b4ac-97108639acb2

I hope to hear from you!

All the best,

Darren

Comment by: Jan Karel Pieterse (3/26/2015 2:04:32 PM)

Interstingly, it seems your ANOVA returns slightly different results from mine?

Comment by: DC (3/26/2015 5:51:58 PM)

The earlier link included only the first 9 rounded values (from the screen image). The following includes the whole data set pasted in from your Excel document (from your downloads page):

http://www.mycurvefit.com/index.html?action=openshare&id=e9975460-3d24-46b9-a6b4-18f98d1c17e7

Here the results are much closer.

The results displayed on the web-page show values to 7 sig figs. In this case the calculated coefficients are almost equal to Solver's. I would expect a slight difference in the results based on the different implementations of the minimisation process. In both cases the results are very good.

Comment by: Jan Karel Pieterse (3/26/2015 6:24:08 PM)

I wasn't referring to the fitted constants but rather to the value of F. yours shows 954, I seem to have 383. Not sure which is correct :-)

Comment by: Duncan (3/27/2015 3:01:16 PM)

Am a university student working on my final year project.

Using an empirical formula in the form of y=A*x^(B)*z^(C), I want to determine the constants A,B, and C based on experimental resuts of y,x, and z using least-square fit method but i have failed. Any help would be welcome.

Thank you

Comment by: Jan Karel Pieterse (3/30/2015 3:21:29 PM)

It may very well be that Solver is having difficulties solving your problem. The only advice I can give you is to experiment with the settings of solver.

If you have an idea in what range the A, B and C values should be, it will also help setting thoes as initial values prior to running solver.

Comment by: Cap (10/29/2015 5:08:18 PM)

I'm neither a math nor statistics expert, so please humor me - once a curve is fitted to the data, is it possible for the equation of the curve to provide a value in order to compare different curves to each other?

Comment by: Jan Karel Pieterse (10/29/2015 5:22:54 PM)

Yes of course, that is what the curve fitting is all about :-) Basically, that is precisely what happens in column C of the demo file.

Comment by: Cap (10/29/2015 5:41:02 PM)

Comment by: Devika (3/23/2016 8:47:44 PM)

p = a + b (1 – e -ct )

I have my data set Y for different T time in hours.

Any simple way you can suggest I use solver to do that and to do the curve fitting as well as R square value?

thanks

Comment by: Fuso (3/30/2016 7:05:48 PM)

Comment by: Mathias (4/1/2016 11:18:33 AM)

I'm using your application to fit y = 0.5*(1+TANH(alfa*( x - beta ))) to a dataset, at first sight the fit looks good. However, when I go to the ANOVA tab I get a " #NUM! " error for the P-value in cell F2. The cause for this seems to be the value in E2 under 'F', which is negative. Also the 'Sum of squares' for the error (cell C3) is negative, which seems a bit odd. Any thoughts on what could be the cause?

Kind regards, Mathias

Comment by: Ali (4/3/2016 2:28:55 AM)

I am trying to solve an optimization problem in which I need to make non-linear least square fitting. the function is as follows:

y(t)=(x/R)+[y(0)-(x/R)]^t/F

so y is a function of time which I know, along with x, and y(0). I have several points for these. I need to find the values of R & F that would create a curve that fits these values. Is this doable using this technique?

Comment by: Jan Karel Pieterse (4/4/2016 9:13:56 AM)

I don't really know. Perhaps you can email your workbook to me? See address at bottom of page.

@Ali:

Have you tried to do it?

Comment by: GB (4/20/2016 5:51:03 AM)

Comment by: Boris (12/13/2016 7:10:57 PM)

One Example of the equation

MR = a*exp(-k*t)+c, where MR-MOISTURE RATIO, t- time;a, k, c-constants

Comment by: Jan Karel Pieterse (12/19/2016 10:18:40 AM)

Unfortunately, without proper guesses for the parameters, Excel is sometimes unable to solve the parameters.

Comment by: Boris (12/26/2016 3:44:42 PM)

Boris Huirem

Comment by: Jan Karel Pieterse (12/30/2016 3:12:26 PM)

I'm afraid that can be the hard part. Depending on the precise model and data starting from a good set of first guesses can be very important. I have no suggestions other than trial and error I'm afraid.

Comment by: Steven (1/4/2017 6:00:41 PM)

Comment by: Jan Karel Pieterse (1/5/2017 6:46:56 AM)

I've never had the chance to try to figure out the math behind determining the accuracy (reliability) of the fitted constants, sorry!

Comment by: Steven (1/6/2017 11:32:59 AM)

Comment by: Sascha (1/23/2017 10:59:29 AM)

thank you for your fine explanation, it works for me also with 2 independent variables. But, is there the possibility to solve multiple data sets?

E.g. I have x1=temperatur, x2=degree of cure and y=speed of cure. With e.g. 3 different heating rates I get 3 x-y data sets.

Thanks

Comment by: Jan Karel Pieterse (1/23/2017 2:46:53 PM)

For problems like that dedicated stats packages are a lot more capable.

Comment by: sumaira ibrahim (1/31/2017 9:46:50 PM)

Comment by: Anthony Lucio (2/2/2017 9:00:43 PM)

I am attempting to use your spreadsheet to model my own data and curves. One question I have is how do you fit more than two parameters? The one discussed above uses two fit parameters but I would like to fit either three, six or nine parameters. Basically, can we extend the worksheet to optimize for Const_c through Const_i? I should mention I am using complex variable equations. I have gotten to the point where I can fit a curve manually by guess/check with MS Excel for three input parameters but I would like to extend it to six or nine, which is cumbersome to attempt manually and would be easier done with the Solver tool IMO. Thanks in advance for any help! Feel free to send me an email as well.

Cheers,

Anthony

Comment by: Jan Karel Pieterse (2/3/2017 11:49:21 AM)

You should be able to use the default 8 parameters the file already allows you to use (const_a to h). If you need more, you can extend the table containing the constants. You do need to make sure each constant has a range name pointing to its cell. You can do so by selecting the table (not its header) with the constants and choosing Formulas, Create From Selection and only checking the box "Left Column".

Comment by: Gosia (2/8/2017 7:13:25 PM)

I try to find 2 parameters within the function containing exponential functions and changing with time two variables:m and p.

I've tried to insert columns to the right of the column with x values (time in my case). Unfortunately, solver doesn't work.

Is there any reason for that?

I've called my columns x, m, p containing values at a specified times. My modelled solution I put in the "yhat" column and real data of solution in y column.

I think that I'm not aware of some function of solver (I don't know, maybe there's a different way to mark those 3 columns as variables changing with time), I'll appreciate any hint.

Gosia

Comment by: Roman Pienzer (2/9/2017 12:43:22 PM)

a question regarding the ANOVA.

In the sheet that you provided, the degrees of freedom are calculated with reference to the amount of yvalues.

Conducting an ANOVA with Excel's built in Data Analysis Tool on the other hand, the degrees of freedom are calculated as K*(n-1), which means both the number of yvalues and yhats are being counted.

Obviously this leads to differing results, in my case with df(JKP)=36 vs df(Excel)=72.

How do I choose the appropriate method? What's the rationale behind only counting the yvalues?

Many thanks for the great sheet and your support in advance!

Roman

Comment by: Jan Karel Pieterse (2/9/2017 4:01:04 PM)

As far as I know, the number of degrees of freedom equals:

Number of y-values - Number of constants in model - 1

Which is what the formula calculates. But I might be wrong of course :-)

Comment by: Jan Karel Pieterse (2/9/2017 4:02:37 PM)

I'm not sure what the problem is I'm afraid :-)

## Have a question, comment or suggestion? Then please use this form.

If your question is not directly related to this web page, but rather a more general "How do I do this" Excel question, then I advise you to ask your question here: www.eileenslounge.com.