Building Copperhead

This semester, I've signed up for three courses:

All of them have been great so far. The last two are graduate seminar classes and both classes have a project. I'm happy about this setup since there is some overlap in what I'm doing - the DSL class specifically focuses on various forms of parallelism, so I get to study more of the same thing. Projects for both these classes involve some amount of GPU computation, so it's more of more of the same thing, or something.

2011-10-16-copperhead.png

Figure 1: Copperhead

(The image is from nvidia's site.)

For the DSL class, the project I've signed up for is a comparative study of Copperhead and Accelerate, which are GPGPU DSLs for Python and Haskell respectively. There is some literature available: mainly, the PPoPP paper, Compiling an Embedded Data Parallel Language and Brian Catanzaro's PhD thesis, Compilation Techniques for Embedded Data Parallel Languages on Copperhead; and Accelerating Haskell Array Codes with Multicore GPUs on Accelerate.

I'd been having some trouble with building Copperhead – mainly because I haven't done any Python in a long time, and because I'm installing stuff into non-default directories, and because I've been lax in reading documentation. So I thought of writing down the exercise here.

The machine happens to run Gentoo 2.0.3; and I do not have admin access to this machine. So some of the dependencies are installed in my home directory, under ~/software. Therefore I also need to set some environment variables:

$ export PYTHONPATH=\
         $HOME/software/lib64/python2.7/site-packages:\
         $HOME/software/lib/python2.7/site-packages:\
         $PYTHONPATH
$ export PATH=~/software/bin:$PATH

The dependencies are:

$ git clone http://git.tiker.net/trees/pycuda.git
$ cd pycuda
$ easy_install --prefix=~/software .

This installs PyCUDA, including the dependencies. (decorator, py, pytest, pytools etc.)

(This doesn't quite work – see the January 3 update below.)

Installing CodePy turned out to be straightforward:

$ git clone http://git.tiker.net/trees/codepy.git
$ cd codepy
$ easy_install --prefix=~/software .

There's some noise about a missing cgen dependency; I chose to ignore it for now.

(As a side note, the last three projects appear to be very interesting on their own and deserve further investigation. This I will hopefully get to do before the end of the semester.)

$ hg clone https://code.google.com/p/copperhead/
$ cd copperhead
$ easy_install --prefix=~/software .

No trouble there either. However:

$ python -c "import copperhead;"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "[...] copperhead/__init__.py", line 20, in <module>
    from prelude import *
  File "[...] copperhead/prelude.py", line 46, in <module>
    import copperhead.runtime.places as PL
  File "[...] copperhead/runtime/__init__.py", line 29, in <module>
    cuda.init()
pycuda._driver.RuntimeError: cuInit failed: no device

This probably means we're in trouble:

$ python -c "import pycuda.driver as cuda; cuda.init()"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
pycuda._driver.RuntimeError: cuInit failed: no device

Bummer. It looks the machine doesn't have a CUDA driver installed yet:

$ /opt/cuda/sdk/C/bin/linux/release/deviceQuery
[deviceQuery] starting...
/opt/cuda/sdk/C/bin/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
[deviceQuery] test results...
FAILED

Press ENTER to exit...

Double bummer!

Turns out that the nVidia GeForce 7300 GT isn't listed as a CUDA device in nVidia's website. This is surprising because: two weeks back I wrote and executed a simple OpenCL program just fine with CL_DEVICE_TYPE_GPU with nVidia's SDK. The other option, CL_DEVICE_TYPE_CPU did not work – my guess is that nVidia's OpenCL doesn't do CPUs yet, despite the "heterogeneous" moniker. The machine has Intel CPU, but no Intel OpenCL SDK installed.

On the other hand, my Thinkpad, which is a recent Sandy Bridge machine, could do CL_DEVICE_TYPE_CPU with Intel OpenCL SDK, but they don't do CL_DEVICE_TYPE_GPU yet. But of course, no CUDA since this is Intel.

(Yes, there are several vendors in this space: nVidia, which pushes both their own CUDA as well as OpenCL; Apple, Intel, AMD, PowerVR, IBM et al. which work with Khronos industry consortium and has support for OpenCL in varying degrees.)

Now to go look for a machine with an actual CUDA device. The perils of bleeding edge…

Updates

(Updated on 22 Oct 2011)

It turned out that I was wrong, which is really awesome, because: GeForce 7300 GT is actually a CUDA device! The actual problem was that I wasn't in the video group, and once that was solved, I could run some CUDA demo programs (such as deviceQuery), and they agreed that there indeed is some CUDA device. (Which is: Device 0: Tesla C1060.)

It didn't end here though.

The next trouble was with Python package cgen, which was mentioned earlier. Cgen used to be a part of codepy, but these days it is a separate project with a separate Python package and everything. And this package turned out to be uninstallable via the usual means:

$ pip install cgen
Downloading/unpacking cgen
  Could not find any downloads that satisfy the requirement cgen
No distributions at all found for cgen

$ easy_install cgen 
Checking existing site.py in $PYTHONPATH
Searching for cgen
Reading http://pypi.python.org/simple/cgen/
Reading http://mathema.tician.de/software/codepy
No local packages or download links found for cgen
error: Could not find suitable distribution for Requirement.parse('cgen')

(I suppose I should alert the maintainer. I've been busy.)

So I installed cgen like so:

$ git clone http://git.tiker.net/trees/cgen.git
$ cd cgen
$ easy_install --prefix=$HOME/software cgen

I should be almost ready to run some simple test programs, but wait! Now it's CUDA turn to complain: gcc-4.5 is not supported, and this is presumbaly because of gcc's move to DWARF. gcc-4.4 is supported and was available, so I created a symlink ~/bin/gcc to /usr/bin/gcc-4.4.5, and placed $HOME/bin ahead in the $PATH.

Next problem was a missing libboost_python-gcc4.3-mt. The system has boost_python installed; however codepy's default is to use the former. This was solved by having an ~/.aksetup-defaults.py, with the following:

BOOST_PYTHON_LIBNAME=["boost_python"]
BOOST_THREAD_LIBNAME=["boost_thread"]

This obscure piece of information came from here. I have no idea what does aksetup mean; nor could I find an explanation in codepy where they look for the said file.

We have one more environment variable to set up:

$ export THRUST_PATH=/opt/cuda/include

At this point one of the sample programs bundles with copperhead run. Yay!

$ python simple_tests.py 

---- Simple INTEGER tests ----
Procedure 'incr'                                   ... PASSED
   copperhead : [1, 2, 3, 4, 5, 6, 7]
Procedure 'incrList'                               ... PASSED
   copperhead : [1, 2, 3, 4, 5, 6, 7]
Procedure 'as_ones'                                ... PASSED
   copperhead : [1, 1, 1, 1, 1, 1, 1]
Procedure 'idm'                                    ... PASSED
   copperhead : [0, 1, 2, 3, 4, 5, 6]
Procedure 'idx'                                    ... PASSED
   copperhead : [0, 1, 2, 3, 4, 5, 6]
Procedure 'saxpy'                                  ... PASSED
   copperhead : [1, 3, 5, 7, 9, 11, 13]
Procedure 'saxpy2'                                 ... PASSED
   copperhead : [1, 3, 5, 7, 9, 11, 13]
Procedure 'saxpy3'                                 ... PASSED
   copperhead : [1, 3, 5, 7, 9, 11, 13]
Procedure 'sxpy'                                   ... PASSED
   copperhead : [0, 2, 4, 6, 8, 10, 12]

---- Simple FLOAT tests ----
Procedure 'as_ones'                                ... PASSED
   copperhead : [1, 1, 1, 1, 1, 1, 1]
Procedure 'idm'                                    ... PASSED
   copperhead : [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
Procedure 'idx'                                    ... PASSED
   copperhead : [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
Procedure 'sxpy'                                   ... PASSED
   copperhead : [0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0]

Sadly that is the only sample I have managed to run. None of the other sample code worked, despite trying to get their dependencies also (matplotlib, plac, scipy…) to work, and grinding teeth harder. They all try hard, spew out some compiler messages and a Python backtrace, and finally fail with the same error:

codepy.CompileError: module compilation failed

This is, in fact, consistent with the disclaimer, so I guess I should not complain too much. :-)

Copperhead is currently under development. Many valid Copperhead programs do not yet compile, and the compiler does not produce helpful error messages. Code that does compile and run may execute inefficiently, compared to hand-coded CUDA. Join the mailing list and let us know of your experiences, but don't expect things to work right out of the box.

Alright, off to join the mailing list then.

(Update: eventually rest of the tests also worked after rebuilding PyCUDA with system-installed version of Boost – see the January 3 update below.)

Afterword

Needless to say, overall I found this whole experience very… tedious. I documented the whole process nevertheless so that (hopefully!) I or my suffering partner in this project would not have to figure all this stuff out again if there is another time. I understand this is the natural price one pays for using software that has gone through very limited field testing, and that Copperhead did not had a chance to gather a community around it, at least yet, and that it's been a one-man project. Still, I'd have liked the setup part to be less work than this.

Further, sadly, little work has been done after Copperhead's author graduated from Berkeley. There are precisely three commit messages in the repository and all of them are from September 2010 and the project hardly seems to have made any progress after that. There's some talk about a roadmap in the mailing list though.

What is really interesting, however, is that Accelerate is nicely chugging along, in spite of Haskell's seemingly smaller community. (You'd agree that it's way smaller as opposed to Python community, yes?) Freshly checked out Accelerate builds just fine on my Thinkpad T420 (with the alternate CPU backend, not CUDA) without any of the bothersome setup that Copperhead demanded; there are hundreds of commits in the "main" github repository; and ten other forks not counting mine. (Which is yet to see some activity. If only the semester would let me!)

Another round of updates

(Updated on 3 January 2012)

I was wrong in assuming that work on Copperhead has come to a stop after Bryan's graduation and move to nVidia. Bryan is still working on this project, and his updates are present on a cloned repository. Hopefully we'll get to see a release soon; and I sincerely hope that Copperhead, just like Thrust, will eventually become part of nVidia's official CUDA SDK release. That would seriously help in putting an end to the era of boilerplate-ridden, highly error-prone, highly frustrating process of writing GPGPU code.

Secondly, PyCUDA as installed as above does not quite work, and after asking in the mailing list it turned out that this is because of PyCUDA shipping with a version of Boost. The solution is to use system-installed Boost. These settings, from from PyCUDA's siteconf.py, are what worked for me:

BOOST_INC_DIR=["/usr/include/boost-1_42/"]
BOOST_LIB_DIR=["/usr/lib"]
BOOST_COMPILER = 'gcc'
USE_SHIPPED_BOOST = False
BOOST_PYTHON_LIBNAME = ['boost_python']
BOOST_THREAD_LIBNAME = ['boost_thread']

That made rest of the tests also work.

My impressions with Copperhead, in spite of the rather painful installation procedure, are extremely positive: for code that is practically developed by one person, it works quite well; and in our benchmarks Copperhead code outperformed Accelerate. This is not quite surprising if you consider the fact that Copperhead-decorated Python code in reality is loaded and run as native code, and not interpreted. Admittedly our benchmarks are by no means exhaustive or even complete; we merely benchmarked what's found to be the "common surface area" of code that we could get working. Which was, hopefully, Good Enough (TM) for a course project.