What is volatility? Some facts and some speculation. Definition Volatility is the annualized standard deviation of returns — it is often expressed in percent. A volatility of 20 means that there is about a one-third probability that an asset’s price a year from now will have fallen or risen by more than 20% from its present value. In R the computation, given a series of daily prices, looks like: sqrt(252) * sd(diff(log(priceSeriesDaily))) * 100 Usually — as here – log returns are used (though it is unlikely to make much difference). Historical estimation What frequency of returns should be used when estimating volatility? There is folklore that it is better to use monthly data than daily data because daily data is more noisy. However, this is finance, so things aren’t that easy. Another complication is if there are assets from around the globe. Through time Volatility would be more boring if finance were like other fields where standard deviations never change. But why? Across assets Implied volatility Not risk
Forecasting within limits Forecasting within limits It is common to want forecasts to be positive, or to require them to be within some specified range . Both of these situations are relatively easy to handle using transformations. Positive forecasts To impose a positivity constraint, simply work on the log scale. . Forecasts constrained to an interval To see how to handle data constrained to an interval, imagine that the egg prices were constrained to lie within and . to the whole real line: where is on the original scale and is the transformed data. The prediction intervals from these transformations have the same coverage probability as on the transformed scale, because quantiles are preserved under monotonically increasing transformations. Related Posts:
D3.js Resources to Level Up | Engineering Blog I have gotten a lot better at D3.js development over the past few years, and can trace most of my improvement to coming across a few key tutorials, blogs, books and other resources on the topic. They’ve been a huge help for me, and I’ve gathered a bunch of my favorites in this post to hopefully help others improve their D3 experience. Here it goes: Assessing your level First, let’s define four general D3.js levels: Complete Beginner: You have no previous experience with D3.js or any front end technologies (HTML/CSS).Basic: You have some HTML/CSS/JS skills and have played around with some D3.js examples, but don’t completely understand the patterns and mechanics it uses.Intermediate You know how to customize D3.js graphs using examples found in search engines, but you struggle to reuse them and aren’t quite happy with the quality of the code itself.Proficient: You have build a lot of different graphs, tests and integrated them with different technologies or libraries. Complete Beginner Books
Time Series Analysis | R Statistics.Net Any metric that is measured over time is a time series. It is of high importance because of industrial relevance especially w.r.t forecasting (demand, sales, supply etc). It can be broken down to its components so as to systematically forecast it. This is a beginners introduction to time series analysis, answering fundamental questions such as: what is a stationary time series, how to decompose it, how to de-trend, de-seasonalize a time series, what is auto correlation, etc. What is a Time Series ? Any metric that is measured over regular time intervals makes a Time Series. How To Create A Time Series In R ? Upon importing your data into R, use ts() function as follows. ts (inputData, frequency = 4, start = c(1959, 2)) # frequency 4 => Quarterly Data ts (1:10, frequency = 12, start = 1990) # freq 12 => Monthly data. Understanding Your Time Series For Additive Time Series, Yt = St + Tt + et For Multiplicative Time Series, Yt = St * Tt * et What Is A Stationary Time Series ?
Interpreting noise When watching the TV news, or reading newspaper commentary, I am frequently amazed at the attempts people make to interpret random noise. For example, the latest tiny fluctuation in the share price of a major company is attributed to the CEO being ill. When the exchange rate goes up, the TV finance commentator confidently announces that it is a reaction to Chinese building contracts. No one ever says “The unemployment rate has dropped by 0.1% for no apparent reason.” What is going on here is that the commentators are assuming we live in a noise-free world. The finance news Every night on the nightly TV news bulletins, a supposed expert will go through the changes in share prices, stock prices indexes, currency rates, and economic indicators, from the past 24 hours. A good rule-of-thumb would be that the change should not be interpreted unless it is at least in magnitude, where Sadly, that’s unlikely to happen. Seasonally adjusted data where
CSV To SQL Converter Convert CSV to SQL Use this tool to convert CSV to SQL statements. From CSV To CSV/Excel Data Tools What can this tool do? INSERT, UPDATE, DELETE, MERGE, and SELECT statements can be created. What are my options? You can specify which fields to include and specify the name of the field. Step 1: Select your input Option 1 - Choose a CSV file Option 2 - Enter an URL Option 3 - paste into Text Box below Input Records- Header: false Data: Separator: , Fields: 0 Records: 0 Step 2: Choose input options (optional) Step 3: Choose output options Step 4: Generate output .sql
R Video tutorial for Spatial Statistics: Introductory Time-Series analysis of US Environmental Protection Agency (EPA) pollution data Download EPA air pollution data The US Environmental Protection Agency (EPA) provides tons of free data about air pollution and other weather measurements through their website. An overview of their offer is available here: The data are provided in hourly, daily and annual averages for the following parameters: Ozone, SO2, CO,NO2, Pm 2.5 FRM/FEM Mass, Pm2.5 non FRM/FEM Mass, PM10, Wind, Temperature, Barometric Pressure, RH and Dewpoint, HAPs (Hazardous Air Pollutants), VOCs (Volatile Organic Compounds) and Lead. All the files are accessible from this page: The web links to download the zip files are very similar to each other, they have an initial starting URL: and then the name of the file has the following format: type_property_year.zip The type can be: hourly, daily or annual. data <- download.EPA(year=2013,property="ozone",type="daily")
Errors on percentage errors The MAPE (mean absolute percentage error) is a popular measure for forecast accuracy and is defined as where denotes an observation and denotes its forecast, and the mean is taken over Armstrong (1985, p.348) was the first (to my knowledge) to point out the asymmetry of the MAPE saying that “it has a bias favoring estimates that are below the actual values”. and , so that the relative error is 50÷150=0.33, in contrast to the situation where , when the relative error would be 50÷100=0.50. Thus, the MAPE puts a heavier penalty on negative errors (when ) than on positive errors. , so positive errors arise only when the forecast is too small. To avoid the asymmetry of the MAPE, Armstrong (1985, p.348) proposed the “adjusted MAPE”, which he defined as By that definition, the adjusted MAPE can be negative (if ), or infinite (if Of course, the true range of the adjusted MAPE is as is easily seen by considering the two cases , where , and letting . . , then .
21 tools that will help your remote team work better together - Page 20 of 20 Meldium Securely sharing passwords with people in your team across the Internet is no easy feat. Getting your team on Meldium means you have control over who has access to what and passwords are never exposed to team members. Meldium works with Internet Explorer, Firefox, Chrome, iOS and Android. ➤ Meldium Time series outlier detection (a simple R function) (By Andrea Venturini) Imagine you have a lot of time series – they may be short ones – related to a lot of different measures and very little time to find outliers. You need something not too sophisticated to solve quickly the mess. This is – very shortly speaking – the typical situation in which you can adopt washer.AV() function in R language. > dati phen time zone value 1 Temperature 1 a01 2.0 2 Temperature 1 a02 20.0 160 Rain 4 a20 8.5 The example of 20 meteorological stations measuring rainfall and temperature is useful to understand in which situation you can implement the washer() methodology. > out=washer.AV(dati) [1] phenomenon: 1 [1] phenomenon: 2 > out[out[,”test.AV”]>5,] fen t.2 series y.1 y.2 y.3 test.AV AV n median.AV mad.AV madindex.AV 18 Rain 2 a18 5.5 6.3 17.0 5.43 -22.2 20 7.580 5.49 36.58 38 Rain 3 a18 6.3 17.0 5.9 24.25 47.2 20 -4.978 2.15 14.34 59 Temperature 2 a19 22.0 21.0 9.0 5.25 10.7 20 0.000 2.04 13.63 79 Temperature 3 a19 21.0 9.0 18.0 14.92 -21.2 20 -0.917 1.36 9.07 1.
Modelling seasonal data with GAMs In previous posts I have looked at how generalized additive models (GAMs) can be used to model non-linear trends in time series data. At the time a number of readers commented that they were interested in modelling data that had more than just a trend component; how do you model data collected throughout the year over many years with a GAM? In this post I will show one way that I have found particularly useful in my research. First an equation. any trend or long term change in the level of the time series, andany seasonal or within-year variation, andany variation or interaction in the trend and seasonal features of the data, I’m not going to cover point 3 in this post, but it is a relatively simple extension to what I will discuss here. y = \beta_0 + f_{\mathrm{seasonal}}(x_1) + f_{\mathrm{trend}}(x_2) + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2\mathbf{\Lambda}) > mod <- gam(y ~ s(x1) + s(x2), data = foo) Data preparation and we are good to go. Load mgcv and fit the naive model
Programming for Data Science the Polyglot approach: Python + R + SQL Guest blog post by ajit jaokar In this post, I discuss a possible new approach to teaching Programming for Data Science. Programming for Data Science is focussed on the R vs. Python question. Everyone seems to have a view including the venerable Nature journal (Programming – Pick up Python). Here, I argue that we look beyond Python vs. On first impressions, this Polyglot approach (ability to master multiple languages) sounds complex. Why teach 3 languages together? Here is some background Outside of Data science, I also co-founded a social enterprise to teach Computer Science to kids Feynlabs. To learn programming for Data Science, it would thus help to build up from an existing foundation they are already familiar with and then co-relate new ideas to this foundation through other approaches. But first, we address what is the problem we are trying to solve and how that problem can be broken down Data Science – the problem we are trying to solve Tools, IDE and Packages Data management
Introducing practical and robust anomaly detection in a time series Both last year and this year, we saw a spike in the number of photos uploaded to Twitter on Christmas Eve, Christmas and New Year’s Eve (in other words, an anomaly occurred in the corresponding time series). Today, we’re announcing AnomalyDetection, our open-source R package that automatically detects anomalies like these in big data in a practical and robust way. Time series from Christmas Eve 2014 Time series from Christmas Eve 2013 Early detection of anomalies plays a key role in ensuring high-fidelity data is available to our own product teams and those of our data partners. This package helps us monitor spikes in user engagement on the platform surrounding holidays, major sporting events or during breaking news. Recently, we open-sourced BreakoutDetection, a complementary R package for automatic detection of one or more breakouts in time series. Broadly, an anomaly can be characterized in the following ways: How does the package work? This yields the following plot: Acknowledgements