Author: steve miller
Last week I posted the first of a three-part series on basic data programming with Python. For that article, I resurrected scripts written 10 years ago that deployed core Python data structures and functions to assemble a Python list for analyzing stock market returns. While it was fun refreshing and modernizing that code, I’m now pretty spoiled working with advanced libraries like NumPy and Pandas that make data programming tasks much simpler.
This second post revolves on a brief showcasing of NumPy, a comprehensive library created in 2005 that extends the Python core to accommodate “large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.”
In addition to introducing a wealth of highly-performant new data structures and mathematical functions, NumPy changed the data programming metaphor in Python from procedural to specification. In Part 1, I detail looping-like code for building the final lists; in Part 2, I pretty much simply invoke array functions and structure subscripting to complete the tasks.
Though I’m the first to acknowledge not being a NumPy expert, I had little trouble figuring out what to do with the help of stackoverflow. Indeed, those familiar with the relatively recent Pandas library for data analysis will readily adapt to the foundational NumPy programming style. Core Python structures such as lists, dictionaries, comprehensions, and iterables serve primarily to feed the NumPy/Pandas beasts.
For the analysis that follows, I focus on performance of the Russell 3000 index, a competitor to the S&P 500 and Wilshire 5000 for “measuring the market”. I first download two files — a year-to-date and a history, that provide final 3000 daily index levels starting in 2005. Attributes include index name, date, level without dividends reinvested, and level with dividends reinvested. I then wrangle the data using NumPy to get to the desired end state.
The technology used for all three articles revolves on JupyterLab 0.32.1, Anaconda Python 3.6.5, NumPy 1.14.3, and Pandas 0.23.0.
Read the remainder of the blog here.