Building a Crossplatform Cuda Application

Following our Essential environment setup, We implement our first C++ and Cuda application modules. We describe on this post the steps needed to build and integrate the Cuda system with C++ and Python using CMake.

Writen by: Santiago Hurtado

Building The explanations come with working code, so we describe only the key sections. We encourage you to get the code an play around. If any questions arise please don’t hesitate to contact us by dm on Twitter.

Building C/C++ Applications with CMake is one of the most widely used approaches, we decide to build this prototype using it for better reproducibility. CMake Our system consists of three implementations: Serial, Multicore and Parallel, it also includes python interfaces and unit tests. The design at the moment supports the compilation of most of the modules separately, however makes the compilation take longer.

Building with CMake

Main CMake

On the main Cmake we use the include directory to have the common headers:

    include_directories(. include)

We also define the dependency on CUDA and C++ 14, since it is what Cuda supports at the moment.

    set(CMAKE_CXX_STANDARD 14)
    set(CMAKE_CUDA_STANDARD 14)

Don’t forget to enable testing to be able to use ctest

    enable_testing()

Serial CMake

The next step is to have our serial implementation working. For this the key line needed for Boost Python to work is:

    option(BUILD_SHARED_LIBS "Build libraries as shared as opposed to static" ON)

Multicore CMake

To have multithread support on the multicore implementation we use Boost Threads library.

    find_package(Boost 1.70.0 COMPONENTS thread REQUIRED)

And also we have the same line to build shared libs.

    option(BUILD_SHARED_LIBS "Build libraries as shared as opposed to static" ON)

Parallel CMake

For Cuda support using CMake we set the project as Cuda and allow the separated compilation.

    find_package(CUDA REQUIRED)
    project(parallel_math_lib CUDA)
    set_target_properties(${PROJECT_NAME} PROPERTIES CUDA_SEPARABLE_COMPILATION ON)

Also we present automatically what architecture for cuda is supported by our device.

  cuda_select_nvcc_arch_flags(ARCH_FLAGS "Auto")
  list(APPEND CUDA_NVCC_FLAGS ${ARCH_FLAGS})

Python CMake

Having the compilation of Boost Python with CMakecode working proved to be tricky. It requires to have libraries with the proper versions specified, found by the Boost Python build, and after by CMake. On the docker build Essential environment setup is presented how to build Boost with support for python 3.6.

    find_package(Boost 1.70.0 COMPONENTS thread python REQUIRED)
    find_package(PythonInterp 3.6 REQUIRED)
    find_package(PythonLibs 3.6 REQUIRED)
    include_directories(${BOOST_INCLUDE_DIRS})

The last piece is to have the lib build without the lib prefix:
```
  set_target_properties(serial_math PROPERTIES PREFIX "")
```

The lib name has to match the module on the C++ Code

  BOOST_PYTHON_MODULE (serial_math){ 
          };

The coolest part is that you can run python tests(or any executable), the key section is to tell python where to find the .o

  add_test(
      NAME
      python_serial_test
      COMMAND
      ${CMAKE_COMMAND} -E env MATH_MODULE_PATH=$<TARGET_FILE_DIR:serial_math>
      ${PYTHON_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/tests/serial_test.py
  )

For python you need to append the path, a easy option is inside the python code
```
  sys.path.append(os.getenv('MATH_MODULE_PATH'))
```

Unit Testing Build

Lastly to have the unit test running on CMake we need to have install gtest, there are recipes that can install the gtest as well, but we try to keep it simple.

We need support for C++(gcc) and Cuda(nvcc)

  project(tests CXX CUDA)
  set(CMAKE_CXX_STANDARD 14)
  find_package(GTest REQUIRED)

Include the path of the gtest libraries

  include_directories(${GTEST_INCLUDE_DIRS})

Add an executable by module

  add_executable(serial_${PROJECT_NAME} serial_tests.cpp)

Linking, dont forget that gtest needs pthread

  target_link_libraries(serial_${PROJECT_NAME} ${GTEST_LIBRARIES} pthread serial_math_lib)

Add the tests to ctest to be executed on the ‘make test’ command

  add_test(
      NAME serial_${PROJECT_NAME}
      COMMAND $<TARGET_FILE:serial_${PROJECT_NAME}>
  )

And repeat for each unit test file
Dont forget to enable the tests
```
  enable_testing()
```

Running the build

We build everything, cross our fingers and run the tests.

Create a build folder
```
  mkdir build
```
Verify all the dependencies
```
  cd build
  cmake ..
```
Build and test
```
  make
  make test
```

If everything works all six tests should pass. Tests

Troubleshooting

To debug the make phase use
```
  make VERBOSE=1
```
To show the complete tests use
```
  ctest -V
```

Conclusion

When working on having performant software it is important to understand the different trade-offs. You must always perform benchmarks and stress tests, find where the biggest bottlenecks and check the different opportunities to improve.

The code done is far from optimal. If you are looking for an optimized solution is recommended to use libraries such as Cublas or Thrust, Where the theoretic performance of the device can be achieved.

We assume the main applications are written in Std C++ that is why the unit tests are developed in C++ using GTest. Considering that Python is one of the most widely used multipurpose programming languages, it makes sense to provide an interface to be used easily by the end-user.

Happy building!

References

https://devblogs.nvidia.com/building-cuda-applications-cmake/