In this video post, I walk  through a basic demo showing how to run the Fama-French regression using R.

This is my first attempt at doing a screencast, so please let me know if you have any comments or suggestions.  

The demo is easiest to follow when viewed in full screen HD.  In lower resolutions, the on-screen commands may be difficult to read.

Additional Info:

The slides used in this screencast can be accessed here, and the CSV data file is available here.

The R code used in the demo is shown below:

# Fama-French Regression example in R

# Load CSV file into R
ff_data <- read.table("ffdata.csv",header=TRUE,sep=",")

# Extract Fama-French Factors and Fund Returns
rmrf <- ff_data[,2]/100
smb <- ff_data[,3]/100
hml <- ff_data[,4]/100
rf <- ff_data[,5]/100
fund <- ff_data[,6]/100

# Calculate Excess Returns for Target fund
fund.xcess <- fund - rf

# Run Fama-French Regression
ffregression <- lm(fund.xcess ~ rmrf + smb + hml)

# Print summary of regression results
print(summary(ffregression))

23 Responses to “Screencast: Fama-French Regression Tutorial Using R”

  1. Hello, this is very helpful thank you very much for it! Could you please make the Fama French example for Eviews as my professors require using specifically this program. Also I have to measure the performance of 1700 M&S deals, which makes returns for 1700 stocks. Could you please show or give a hint how can we run the Fama French regression for so many stocks? Should we form portfolios and how?

    Thank you in advance!!

  2. Could you please how us how to import the data in the program. Can we do it manually or with which commands?

  3. Dear Sir

    I tried the above code and got the results for Fama and French model, your code made the calculation really easy and quick. However, I have a question concerning the mutual fund data, can we include more than one mutual fund returns (example sample of 400) and get results for every one of them at the same time. If it is possible to do that could you please show me what changes do I need to make to the above code.

    Many thanks in advance
    Kind Regards

    • Hi Amir,
      I’m glad the code worked for you. Running on 400 funds could be done in a few different ways, but it would take some work to modify the code. You could have each fund as a separate column in the input spreadsheet and then just add a for loop to iterate through each column and run the regression on each fund. This would be very slow, but it should work. A more elegant way would be run all the regressions simultaneously using matrix algebra (would probably work easiest in Matlab or Octave), but that would require a completely new program.

      -Chad

      • Dear Chad
        Many thanks for your advice. I used the first recommendation and it works fine. I think for future users this might become useful. therefore the copy of the code is available below.
        I hope you don’t mind.

        Kind regard
        Amirali

        # Fama-French Regression example in R

        # Load CSV file into R
        ff_data <- read.table("ffdata.csv",header=TRUE,sep=",")

        # Extract Fama-French Factors and Fund Returns
        rmrf <- ff_data[,2]/100
        smb <- ff_data[,3]/100
        hml <- ff_data[,4]/100
        rf <- ff_data[,5]/100

        # Run Fama-French Regression
        for(i in 6:194){
        fund <- ff_data[,i]/100
        fund.xcess <- fund – rf
        ffregression <- lm(fund.xcess ~ rmrf + smb + hml)

        # Print summary of regression results
        print(summary(ffregression))

        }

        • Thanks for posting the modified code!

          -Chad

        • Thanks for the updated code, really helpful! I have a further question, however: How can you export all of the regression outputs and how do you see which regression summary corresponds to which fund?

          Cheers,
          M

  4. greatly appreciated, thanks mate.

  5. Hi Chad

    How come in the demo you dont divide the returns by 100 but in the code you do ( i.e. convert to %)?
    Also how would you convert the excess return ( alpha) to an annual rate if you were running it at the daily level.

    Thanks for a great tutorial!

    Best
    Mark

  6. Chad,

    Great tutorial.

    I think all of my code is correct (I copied from yours but added a Momentum factor), but the script cannot find my file.

    Any ideas on what I might be doing wrong?

    Thanks,
    Cory

    • The script is not finding the csv file? Have you verified that the file is in the working directory? You can change the working directory using the “File” pulldown menu. Also, if you do the command “list.files()” from the command line you should see the target file.

      -Chad

  7. Thank you for posting this tutorial Chad,

    I copied your script and ran the analysis.Howver all my results are 100 times smaller than what you got. The figures are the exact same just 100 times smaller. Have you any idea why I be off by x100?

    Kind regards,
    Paul

    • Paul,

      The Fama-French factor returns are given as percents. For example, a factor return shown as 3 is actually 3%. So, in my example code I divide by 100, so it is 0.03.

      However, if the fund returns you are using are already decimal equivalent values (3% is shown as 0.03) then there is no need to divide the fund returns by 100.

      Try changing line 11 from:

      fund <- ff_data[,6]/100 to this: fund <- ff_data[,6]

      • My results are 100 times smaller and I’m using your ffdata.csv file:


        Residuals:
        Min 1Q Median 3Q Max
        -0.043944 -0.008162 -0.000333 0.009172 0.042174

        Coefficients:
        Estimate Std. Error t value Pr(>|t|)
        (Intercept) -0.002922 0.001946 -1.502 0.13881
        rmrf 1.210644 0.041743 29.002 < 2e-16 ***
        smb 0.151090 0.086468 1.747 0.08606 .
        hml -0.298928 0.073488 -4.068 0.00015 ***

      • Note that the demo code posted above is not identical to the code used in the video. The code used in the video doesn’t divide any of the values by 100, so the Intercept is 100 times higher.

        • Yes, there is a discrepancy.

          The Ken French website gives a 5.4% return as 5.4. Other return sources often show a 5.4% return with the decimal equivalent value of 0.054.

          It doesn’t matter which format you use, but the factor returns and the target returns must be consistent. So, you may need to divide the factor returns by 100 or multiply the target returns by 100.

          • Thanks for your comment on percentages. I am having difficulty reproducing the results because nowhere have I found whether the monthly FF data is percentages (which you confirm, thank you), but is it annualized percentages or percentages on the month, and if annualized, is it compounded or not. Very frustrating that an academic site does not define precisely what the data represents!

            Thanks, in anticipation!

  8. Hi,
    I am very happy with this tutorial 🙂
    For my research, it is essential to know how you get the returns of firms like FMAGX.

    Via my University I get access to CRSP, CRSP has a lot of possible variables to use. I was wondering which variable to use to get the monthly return of a firm. The variable called ‘holding period return’ seems like the most compared to your data. Please could someone give their opinion.

    Thanks in advance!

  9. Thank you very much! It worked! It is just that it didn’t read the file until I rewrote the first line:

    # Load CSV file into R
    df <- read.table("ffdata.csv",header=TRUE,sep=",")

    # Extract Fama-French Factors and Fund Returns
    rmrf <- df[,2]
    smb <- df[,3]
    hml <- df[,4]
    rf <- df[,5]
    fund <- df[,6]

  10. Here is another alternative. I’m using of the same regression using GLS, AR(1,1) and Maximum Likelyhood. I’m including Fraziini BAB and QMJ return factors together with Fama and French 5 factors. The parameters can be added or removed according to preference and type of investment strategy used.


    # Load data into R
    setwd("/Users/alexbadoi/Desktop/College/Postgrad/Master/Seminar Pc/R")
    ff_data = read.csv("ff5_bab_qmj.csv", header = TRUE, sep=",")
    library(nlme)

    # Extract Fama-French Factors + Momentum + Franziini Factors
    rmrf <- ff_data[,2]/100
    rf <- ff_data[,7]/100
    smb <- ff_data[,3]/100
    hml <- ff_data[,4]/100

    rmw <- ff_data[,5]/100
    cma <- ff_data[,6]/100

    umd <- ff_data[,8]
    bab <- ff_data[,9]/100
    qmj <- ff_data[,10]/100

    for(i in 11:18){ #Loop across rows of assets
    fund <- ff_data[,i]/100
    fund.xcess <- fund - rf
    ffgls <- gls(fund.xcess ~ rmrf + smb + hml + rmw + cma,
    correlation=corARMA(p=1, q=1), method='ML')
    print(summary(ffgls))
    }

  11. Hello, thanks for your helpfull document. I’m vietnamese. Now I would like to apply fama french model in Viet Nam market . Apply for Hose martket with 298 company and time to study form 2009 to2013. There is something i dont understand : what is fund in your code ? “. As I mean is that average return of stocks in each year. Is that right? And would you please give me some codes in R for testing estimate model? Thank you very much

  12. I am testing the a couple of CAPM based models for my dissertation, and I have a healthy amount of stocks to regress, 5000 according to my last calculations, they’d have to be done as 75 stocks at a time, (in a portfolio). The number makes it an unrealistic task to accomplish manually, I have tried to build on the scipts Ive found here, which were of extreme use, however I lack the skills to make it actually work, I can post a copy of my modified script and if possible I can send you a copy of one of my excel sheets to figure out if that is the root of the problem.
    my script, its probably primitive and vulgar. Any help is greatly appreciated.
    Thank you in advanced.
    Hameedalmaa@gmail.com

    # Fama-French Regression example in R

    # Load CSV file into R
    ff_data <- read.table("ffdata.csv",header=TRUE,sep=",")

    # Extract Fama-French Factors and Fund Returns
    rmrf <- ff_data[,1]
    smb <- ff_data[,2]
    hml <- ff_data[,3]
    rf <- ff_data[,4]
    fund1 <- ff_data[,5]
    fund2 <- ff_data[,6]
    fund3 <- ff_data[,7]
    fund4 <- ff_data[,8]
    fund5 <- ff_data[,9]
    fund6 <- ff_data[,10]
    fund7 <- ff_data[,11]
    fund8 <- ff_data[,12]
    fund9 <- ff_data[,13]
    fund10 <- ff_data[,14]
    fund11 <- ff_data[,15]
    fund12 <- ff_data[,16]
    fund13 <- ff_data[,17]
    fund14 <- ff_data[,18]
    fund15 <- ff_data[,17]
    fund16 <- ff_data[,20]
    fund17 <- ff_data[,21]
    fund18 <- ff_data[,22]
    fund19 <- ff_data[,23]
    fund20 <- ff_data[,24]
    fund21 <- ff_data[,25]
    fund22 <- ff_data[,26]
    fund23 <- ff_data[,27]
    fund24 <- ff_data[,28]
    fund25 <- ff_data[,29]
    fund26 <- ff_data[,30]
    fund27 <- ff_data[,31]
    fund28 <- ff_data[,32]
    fund29 <- ff_data[,33]
    fund30 <- ff_data[,34]
    fund31 <- ff_data[,35]
    fund32 <- ff_data[,36]
    fund33 <- ff_data[,37]
    fund34 <- ff_data[,38]
    fund35 <- ff_data[,39]
    fund36 <- ff_data[,40]
    fund37 <- ff_data[,41]
    fund38 <- ff_data[,42]
    fund39 <- ff_data[,43]
    fund40 <- ff_data[,44]
    fund41 <- ff_data[,45]
    fund42 <- ff_data[,46]
    fund43 <- ff_data[,47]
    fund44 <- ff_data[,48]
    fund45 <- ff_data[,49]
    fund46 <- ff_data[,50]
    fund47 <- ff_data[,51]
    fund48 <- ff_data[,52]
    fund49 <- ff_data[,53]
    fund50 <- ff_data[,54]
    fund51 <- ff_data[,55]
    fund52 <- ff_data[,56]
    fund53 <- ff_data[,57]
    fund54 <- ff_data[,58]
    fund55 <- ff_data[,59]
    fund56 <- ff_data[,60]
    fund57 <- ff_data[,61]
    fund58 <- ff_data[,62]
    fund59 <- ff_data[,63]
    fund60 <- ff_data[,64]
    fund61 <- ff_data[,65]
    fund62 <- ff_data[,66]
    fund63 <- ff_data[,67]
    fund64 <- ff_data[,68]
    fund65 <- ff_data[,69]
    fund66 <- ff_data[,70]
    fund67 <- ff_data[,71]
    fund68 <- ff_data[,72]
    fund69 <- ff_data[,73]
    fund70 <- ff_data[,74]
    fund71 <- ff_data[,75]
    fund72 <- ff_data[,76]
    fund73 <- ff_data[,77]

    ffregression <- lm(fund1 ~ rmrf + smb + hml)
    # Print summary of regression results
    print(summary(ffregression))}

    ffregression <- lm(fund2 ~ rmrf + smb + hml)
    # Print summary of regression results
    print(summary(ffregression))}

    ffregression <- lm(fund3 ~ rmrf + smb + hml)
    # Print summary of regression results
    print(summary(ffregression))}

    ffregression <- lm(fund4 ~ rmrf + smb + hml)
    # Print summary of regression results
    print(summary(ffregression))}

    ffregression <- lm(fund5 ~ rmrf + smb + hml)
    # Print summary of regression results
    print(summary(ffregression))}

    ffregression <- lm(fund6 ~ rmrf + smb + hml)
    # Print summary of regression results
    print(summary(ffregression))}

    ffregression <- lm(fund7 ~ rmrf + smb + hml)
    # Print summary of regression results
    print(summary(ffregression))}

    ffregression <- lm(fund8 ~ rmrf + smb + hml)
    # Print summary of regression results
    print(summary(ffregression))}

  13. Hi, sir

    Thanks for your helpful tutorial. We all download these factor from French’s data library. But how does he construct these 25 portfolios? Since I’m trying to construct the 3 factors using Chinese stock market’s data, I have to construct these portfolios by myself. Could you give me some help on constructing FF’s 25 portfolios? Thanks in advance.

    Kind Regards,
    Shuhua

Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)