Wednesday, March 13, 2013

Web Scraping with R

In this write up I'll describe an R function that I use to fetch stock data from the web. The cool thing about this function is that it is done in pure R, the data that gets returned can be used as a data frame which in turn can be analysed in any way or charted for different metrics. Here is that code.


Technical Analysis: The Complete Resource for Financial Market Technicians (2nd Edition)
The R Book
#!/usr/bin/Rscript

getData = function(instrument,lookback=365){
  # instrument tells the stock ticker
  # default lookback is 365 days

  # Format the start date
  starty = format(Sys.time()-60*60*24*lookback,"%Y")
  startm = format(Sys.time()-60*60*24*lookback,"%m")
  startd = format(Sys.time()-60*60*24*lookback,"%d")

  # Format the stop date
  endy = format(Sys.time(),"%Y")
  endm = format(Sys.time(),"%m")
  endd = format(Sys.time(),"%d")

  # Create a url. Using Y! Finance here
  # You can use any site which offers this data
  url = paste("http://ichart.finance.yahoo.com/table.csv?s=",
    instrument,"&a=",
    startm,"&b=",
    startd,"&c=",
    starty,"&d=",
    endm,"&e=",
    endd,"&f=",
    endy,"&g=d&ignore=.csv",
    sep="")

  # The destination file to write out this data
  # Use the pid of the process
  # Its a simple way to get uniqueness
  destfile = paste("out.txt.",Sys.getpid())
  print(url)
  cat("Fetching data from Y! Finance for ",
      instrument," Ranging ",
      starty,startm," -> ",
      endy,endm,"\n")
  
  # Fetch that data
  status = download.file(url,destfile,method="auto",quiet=TRUE,cacheOK=FALSE)
  if(status != 0){
    # Some error. Stop!
    unlink(destfile)
    stop(paste("Download error, status ",status))
  }
  nlines = length(count.fields(destfile,sep="\n"))
  if(nlines == 1){
    # Site didn't return data
    unlink(destfile)
    stop(paste("No data available for",instrument))
  }
  # Read the data in as a table
  data = read.table(file=destfile,sep=",",header=T,as.is=T)
  # Delete the temporary file
  unlink(destfile)
  # Return the data
  data
}

argv = commandArgs(trailingOnly=T)
if(length(argv) != 1){
  cat("Usage: this-file.r \n")
  q()
}

# Get 60 days look back data
x = getData(argv[1],60)

# Check to see if its there
head(x)

ggplot2: Elegant Graphics for Data Analysis (Use R!)

Sunday, March 10, 2013

A Technical Chart for Run ups and downs in R

Two of the most important aspects of technical analysis and trading is to watch for the percentage of movement to the up or down and the associated volume. A large volume large percentage upwards movement is a sign that lots of big fish are jumping into the stock. Likewise, a large volume large percentage downwards movement indicates traders are dumping the stock and its a falling knife you want to stay away. However most day to day movement don't fall into either of these buckets. Worse, it is generally difficult to place the price and volume in perspective with respect to past price movements. In this write up, I'll try and explain a charting technique that addresses this issue. A charting technique does not give out a number like a metric, instead it gives a visual perspective on where the stock stands as of a given day.

Technical Analysis: The Complete Resource for Financial Market Technicians (2nd Edition)

To begin, we will need two relatively simple metrics in place. A volume index and the percentage change in price.

Volume Index:
This is the ratio of the volume of shares traded on a given day by the average volume of the stock. The higher this number, the greater the momentum in a given direction. So if average volume of shares traded in a day is \(v_{avg}\) and volume on a given day is \(v_{t}\), the volume index \(v_{index}\) is
$$v_{index} = \frac{v_t}{v_{avg}}$$

Percentage change:
The fractional change in price on a given day from the prior day. If price on day \(t\) is \(p_t\) and on day \(t+1\) is \(p_{t+1}\) then the percentage change is given as
$$\Delta_p = \frac{p_{t+1} - p_{t}}{p_{t}}$$

Technical Analysis Explained : The Successful Investor's Guide to Spotting Investment Trends and Turning Points

The goal is to now club all the series of run ups and run downs together and place them on an X-Y chart. For example, if the price, volume and average volume were \([\{2,100,90\},\{3,110,90\},\{4,80,90\},\{3.5,70,90\}\ldots]\) then the data gets mapped as \(\{\frac{3-2}{2}=0.5,\frac{100}{90}=1.11\},\{\frac{4-3}{3}=0.33,1.22\},\{\frac{3.5 - 4}{4}=-0.125,\frac{80}{90} = 0.88\},\ldots\). Next, we tag each series of continuously positive and negative price changes with a number. This is more as a preparatory step to plot the data. R has a nice function "rle" which stands for "run length encoding" for doing exactly this. Our goal is to make an overlaid line chart of all the positives and negatives. This chart is useful to visualize how any recent run-up/run-down on a stock stacks up against previous run-ups/run-downs of that same stock.

The R Book

Finally, we get historical price data for the stock we wish to analyse, say Yahoo! Finance for McDonald's stock (MCD) for the past 6 months. We will save this file as "mcd.csv". In the following code I'll use the Hadley Wickam's ggplot2 package to chart this. For \(v_{avg}\) I'll use the overall average of volume for the trailing 6 months (5722305) for simplicity, but you can tweak this. The code below charts out the continuous run up and run down price changes to the MCD stock.


The chart generated is shown below

The code used is shown below.

ggplot2: Elegant Graphics for Data Analysis (Use R!)
 
#!/usr/bin/Rscript

library(ggplot2)
library(grid)
library(scales)

# Read in the file
x = read.csv("mcd.csv",sep=",",header=T)

# Pick the Volume and Price column
x = x[,c(6,7)]
colnames(x) = c('Volume','Price')

# 6 month average
# Change this to some sliding window
vol.avg = 5722305

# Compute the volume index and price change
x.volindex = x$Volume/vol.avg
x.volindex = x.volindex[1:(length(x.volindex) - 1)]
x.delta    = -diff(x$Price)/x$Price[2:length(x$Price)]


# Create a vector which has the same length as x.delta
# and is tagged as 'Positive' when x.delta[i] is greater than 0 and
# is tagged as 'Negative' when x.delta[j] is less than 0.
x.tag = rep('Positive',length(x.delta))
x.tag[x.delta < 0] = 'Negative'
x.tmp = rle(x.tag)
x.t = data.frame(lab = c())
x.index = data.frame(index = c())
for(i in 1:length(x.tmp$lengths)){
  x.t = rbind(x.t,data.frame(lab=rep(i,x.tmp$lengths[i])))
  x.index = rbind(x.index,data.frame(index = seq(1:x.tmp$lengths[i])))
}

x = data.frame(
  delta.price = x.delta,
  index = x.index$index,
  volindex = x.volindex,
  tag = x.t$lab
  )

p1 = ggplot(x,aes(index,delta.price,group=tag)) +
  geom_line(alpha=0.3) +
  scale_y_continuous(labels=percent,
                     breaks=round(
                       seq(min(x$delta.price),
                           max(x$delta.price),
                           by=0.005),3)) +
  scale_x_continuous(breaks=seq(min(x$index),max(x$index),by=1)) +
  xlab(" Days in Run") + ylab(" Change in Price") +
  geom_hline(yintercept=0,linetype="longdash")

p2 = ggplot(x,aes(index,volindex,group=tag)) +
  geom_line(alpha=0.3) +
  scale_y_continuous(breaks=round(
                       seq(min(x$volindex),
                           max(x$volindex),
                           by=0.5),2)) +
  scale_x_continuous(breaks=seq(min(x$index),max(x$index),by=1)) +
  xlab(" Days in Run") + ylab(" Volume Index")


png(filename="t.png",width=800,height=800)
# Change p1 to p2 to get the volume index chart.
print(p1)
dev.off()