This semester, I've signed up for three courses:
- Dan Friendman's Programming Language Principles,
- Ryan Newton's Domain Specific Languages and Compilers, and
- Arun Chauhan's Parallel Architectures and Programming.
All of them have been great so far. The last two are graduate seminar classes and both classes have a project. I'm happy about this setup since there is some overlap in what I'm doing - the DSL class specifically focuses on various forms of parallelism, so I get to study more of the same thing. Projects for both these classes involve some amount of GPU computation, so it's more of more of the same thing, or something.
Figure 1: Copperhead
(The image is from nvidia's site.)
For the DSL class, the project I've signed up for is a comparative study of Copperhead and Accelerate, which are GPGPU DSLs for Python and Haskell respectively. There is some literature available: mainly, the PPoPP paper, Compiling an Embedded Data Parallel Language and Brian Catanzaro's PhD thesis, Compilation Techniques for Embedded Data Parallel Languages on Copperhead; and Accelerating Haskell Array Codes with Multicore GPUs on Accelerate.
I'd been having some trouble with building Copperhead – mainly because I haven't done any Python in a long time, and because I'm installing stuff into non-default directories, and because I've been lax in reading documentation. So I thought of writing down the exercise here.
The machine happens to run Gentoo 2.0.3; and I do not have admin
access to this machine. So some of the dependencies are installed in
my home directory, under
~/software. Therefore I also need to set
some environment variables:
$ export PYTHONPATH=\ $HOME/software/lib64/python2.7/site-packages:\ $HOME/software/lib/python2.7/site-packages:\ $PYTHONPATH $ export PATH=~/software/bin:$PATH
The dependencies are:
- Python 2.7. We have Python 2.7.1 on this machine.
- CUDA 3.0. We have 4.0.
- numpy 1.3. We have 1.6.0.
- Boost 1.38, though only boost-python and boost-thread are really needed. We have boost 1.42.
- PyCUDA - this was a pain point. PyCUDA's build.py doesn't seem to
support installing with non-root permissions ("
python setup.py" would say: "
--prefix? We don't know what that is!"). Finally I actually read Python distutils documentation and figured out this should have been actually really painless, and felt quite embarrassed:
$ git clone http://git.tiker.net/trees/pycuda.git $ cd pycuda $ easy_install --prefix=~/software .
This installs PyCUDA, including the dependencies. (decorator, py, pytest, pytools etc.)
(This doesn't quite work – see the January 3 update below.)
- CodePy, which is a C/C++ metaprogramming toolkit in Python! It's
"meta" in the sense that it supports:
- Generating C/C++ code;
- And then compiling the generated code and, then loading it dynamically into Python runtime.
Installing CodePy turned out to be straightforward:
$ git clone http://git.tiker.net/trees/codepy.git $ cd codepy $ easy_install --prefix=~/software .
There's some noise about a missing
cgen dependency; I chose to
ignore it for now.
- Thrust, a C++ CUDA library that resembles STL. Since this is a template library, there's really nothing to "build" and install; furthermore, CUDA 4.0 ships with Thrust. Squee!
(As a side note, the last three projects appear to be very interesting on their own and deserve further investigation. This I will hopefully get to do before the end of the semester.)
- Finally, installed copperhead itself:
$ hg clone https://code.google.com/p/copperhead/ $ cd copperhead $ easy_install --prefix=~/software .
No trouble there either. However:
$ python -c "import copperhead;" Traceback (most recent call last): File "<string>", line 1, in <module> File "[...] copperhead/__init__.py", line 20, in <module> from prelude import * File "[...] copperhead/prelude.py", line 46, in <module> import copperhead.runtime.places as PL File "[...] copperhead/runtime/__init__.py", line 29, in <module> cuda.init() pycuda._driver.RuntimeError: cuInit failed: no device
This probably means we're in trouble:
$ python -c "import pycuda.driver as cuda; cuda.init()" Traceback (most recent call last): File "<string>", line 1, in <module> pycuda._driver.RuntimeError: cuInit failed: no device
Bummer. It looks the machine doesn't have a CUDA driver installed yet:
$ /opt/cuda/sdk/C/bin/linux/release/deviceQuery [deviceQuery] starting... /opt/cuda/sdk/C/bin/linux/release/deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 38 -> no CUDA-capable device is detected [deviceQuery] test results... FAILED Press ENTER to exit...
Turns out that the nVidia GeForce 7300 GT isn't listed as a CUDA
device in nVidia's website. This is surprising because: two weeks
back I wrote and executed a simple OpenCL program just fine with
CL_DEVICE_TYPE_GPU with nVidia's SDK. The other option,
CL_DEVICE_TYPE_CPU did not work – my guess is that nVidia's OpenCL
doesn't do CPUs yet, despite the "heterogeneous" moniker. The machine
has Intel CPU, but no Intel OpenCL SDK installed.
On the other hand, my Thinkpad, which is a recent Sandy Bridge
machine, could do
CL_DEVICE_TYPE_CPU with Intel OpenCL SDK, but they
CL_DEVICE_TYPE_GPU yet. But of course, no CUDA since this
(Yes, there are several vendors in this space: nVidia, which pushes both their own CUDA as well as OpenCL; Apple, Intel, AMD, PowerVR, IBM et al. which work with Khronos industry consortium and has support for OpenCL in varying degrees.)
Now to go look for a machine with an actual CUDA device. The perils of bleeding edge…
It turned out that I was wrong, which is really awesome, because:
GeForce 7300 GT is actually a CUDA device! The actual problem was
that I wasn't in the
video group, and once that was solved, I could
run some CUDA demo programs (such as
deviceQuery), and they agreed
that there indeed is some CUDA device. (Which is:
Device 0: Tesla
It didn't end here though.
The next trouble was with Python package cgen, which was mentioned
earlier. Cgen used to be a part of
codepy, but these days it is a
separate project with a separate Python package and everything. And
this package turned out to be uninstallable via the usual means:
$ pip install cgen Downloading/unpacking cgen Could not find any downloads that satisfy the requirement cgen No distributions at all found for cgen $ easy_install cgen Checking existing site.py in $PYTHONPATH Searching for cgen Reading http://pypi.python.org/simple/cgen/ Reading http://mathema.tician.de/software/codepy No local packages or download links found for cgen error: Could not find suitable distribution for Requirement.parse('cgen')
(I suppose I should alert the maintainer. I've been busy.)
So I installed cgen like so:
$ git clone http://git.tiker.net/trees/cgen.git $ cd cgen $ easy_install --prefix=$HOME/software cgen
I should be almost ready to run some simple test programs, but wait!
Now it's CUDA turn to complain: gcc-4.5 is not supported, and this is
presumbaly because of gcc's move to DWARF. gcc-4.4 is supported and
was available, so I created a symlink
/usr/bin/gcc-4.4.5, and placed
$HOME/bin ahead in the
Next problem was a missing
libboost_python-gcc4.3-mt. The system
boost_python installed; however codepy's default is to use the
former. This was solved by having an
This obscure piece of information came from here. I have no idea what
aksetup mean; nor could I find an explanation in codepy where
they look for the said file.
We have one more environment variable to set up:
$ export THRUST_PATH=/opt/cuda/include
At this point one of the sample programs bundles with copperhead run. Yay!
$ python simple_tests.py ---- Simple INTEGER tests ---- Procedure 'incr' ... PASSED copperhead : [1, 2, 3, 4, 5, 6, 7] Procedure 'incrList' ... PASSED copperhead : [1, 2, 3, 4, 5, 6, 7] Procedure 'as_ones' ... PASSED copperhead : [1, 1, 1, 1, 1, 1, 1] Procedure 'idm' ... PASSED copperhead : [0, 1, 2, 3, 4, 5, 6] Procedure 'idx' ... PASSED copperhead : [0, 1, 2, 3, 4, 5, 6] Procedure 'saxpy' ... PASSED copperhead : [1, 3, 5, 7, 9, 11, 13] Procedure 'saxpy2' ... PASSED copperhead : [1, 3, 5, 7, 9, 11, 13] Procedure 'saxpy3' ... PASSED copperhead : [1, 3, 5, 7, 9, 11, 13] Procedure 'sxpy' ... PASSED copperhead : [0, 2, 4, 6, 8, 10, 12] ---- Simple FLOAT tests ---- Procedure 'as_ones' ... PASSED copperhead : [1, 1, 1, 1, 1, 1, 1] Procedure 'idm' ... PASSED copperhead : [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0] Procedure 'idx' ... PASSED copperhead : [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0] Procedure 'sxpy' ... PASSED copperhead : [0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0]
Sadly that is the only sample I have managed to run. None of the
other sample code worked, despite trying to get their dependencies
scipy…) to work, and grinding teeth
harder. They all try hard, spew out some compiler messages and a
Python backtrace, and finally fail with the same error:
codepy.CompileError: module compilation failed
This is, in fact, consistent with the disclaimer, so I guess I should not complain too much. :-)
Copperhead is currently under development. Many valid Copperhead programs do not yet compile, and the compiler does not produce helpful error messages. Code that does compile and run may execute inefficiently, compared to hand-coded CUDA. Join the mailing list and let us know of your experiences, but don't expect things to work right out of the box.
Alright, off to join the mailing list then.
(Update: eventually rest of the tests also worked after rebuilding PyCUDA with system-installed version of Boost – see the January 3 update below.)
Needless to say, overall I found this whole experience very… tedious. I documented the whole process nevertheless so that (hopefully!) I or my suffering partner in this project would not have to figure all this stuff out again if there is another time. I understand this is the natural price one pays for using software that has gone through very limited field testing, and that Copperhead did not had a chance to gather a community around it, at least yet, and that it's been a one-man project. Still, I'd have liked the setup part to be less work than this.
Further, sadly, little work has been done after Copperhead's author graduated from Berkeley. There are precisely three commit messages in the repository and all of them are from September 2010 and the project hardly seems to have made any progress after that. There's some talk about a roadmap in the mailing list though.
What is really interesting, however, is that Accelerate is nicely chugging along, in spite of Haskell's seemingly smaller community. (You'd agree that it's way smaller as opposed to Python community, yes?) Freshly checked out Accelerate builds just fine on my Thinkpad T420 (with the alternate CPU backend, not CUDA) without any of the bothersome setup that Copperhead demanded; there are hundreds of commits in the "main" github repository; and ten other forks not counting mine. (Which is yet to see some activity. If only the semester would let me!)
Another round of updates
I was wrong in assuming that work on Copperhead has come to a stop after Bryan's graduation and move to nVidia. Bryan is still working on this project, and his updates are present on a cloned repository. Hopefully we'll get to see a release soon; and I sincerely hope that Copperhead, just like Thrust, will eventually become part of nVidia's official CUDA SDK release. That would seriously help in putting an end to the era of boilerplate-ridden, highly error-prone, highly frustrating process of writing GPGPU code.
Secondly, PyCUDA as installed as above does not quite work, and after
asking in the mailing list it turned out that this is because of
PyCUDA shipping with a version of Boost. The solution is to use
system-installed Boost. These settings, from from PyCUDA's
siteconf.py, are what worked for me:
BOOST_INC_DIR=["/usr/include/boost-1_42/"] BOOST_LIB_DIR=["/usr/lib"] BOOST_COMPILER = 'gcc' USE_SHIPPED_BOOST = False BOOST_PYTHON_LIBNAME = ['boost_python'] BOOST_THREAD_LIBNAME = ['boost_thread']
That made rest of the tests also work.
My impressions with Copperhead, in spite of the rather painful installation procedure, are extremely positive: for code that is practically developed by one person, it works quite well; and in our benchmarks Copperhead code outperformed Accelerate. This is not quite surprising if you consider the fact that Copperhead-decorated Python code in reality is loaded and run as native code, and not interpreted. Admittedly our benchmarks are by no means exhaustive or even complete; we merely benchmarked what's found to be the "common surface area" of code that we could get working. Which was, hopefully, Good Enough (TM) for a course project.