[Tensorflow] Mac OS 上 Tensorflow 1.0 安装(支持 CUDA)

maxresdefault

Tensorflow 深度学习工具包已经出 1.0 正式版了,本文就是介绍如何在 Mac 上如何安装 GPU 版本。

0、运行环境:
软件环境:
Mac OSX 10.12
Xcode 8.1
Python 3.5
CUDA Toolkit 8.0
cuDNN 5.1
Homebrew

硬件环境:
CPU:3.5 GHz Intel Core i7
Memory:16 GB 1600 MHz DDR3
GPU:NVIDIA GeForce GTX 775M 2048 MB
8882d66f-d8f5-4929-b309-b82411c70fa3

1、安装依赖:
1)安装 CUDA Driver 8.0.63:
请到如下地址安装最新版 CUDA Driver for Mac:
http://www.nvidia.com/object/mac-driver-archive.html

我这里安装的版本是 8.0.63。你也可以到我的网盘下载安装:
https://pan.baidu.com/s/1slliX3J

如果您已经安装,可以在 Apple -> System Preferences -> CUDA 选项里面点击按钮升级:
1e82c3d4-9777-4b5e-8c45-096906b17cfa

2)安装 CUDA Toolkit 8.0:
建议在线下载 dmg 安装包安装 CUDA Toolkit 8.0,下载地址如下(下载后需要更名为.dmg安装):
https://developer.nvidia.com/compute/cuda/8.0/Prod/local_installers/cuda_8.0.55_mac-dmg

或者到我的网盘下载:
https://pan.baidu.com/s/1geBBDuj

下载后双击安装,均使用默认配置即可。看到如下画面后表示安装成功:
screen-shot-2017-02-15-at-5-11-37-pm

配置 CUDA 环境,编辑 ~/.bash_profile 文件,在后面加入:

执行以下命令重启 bash_profile:

在以后打开的新建窗口中,这一环境变量都会生效。

PS:检查 CUDA 是否正常运行:
在安装好 CUDA Toolkit 和 Driver 以及 Samples 后,可以编译使用 CUDA 的 deviceQuery:

如果结尾显示:
Result = PASS

就表示安装成功了。

3)安装 cuDNN 5.1:
下载地址如下:
https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v5.1/prod_20161129/8.0/cudnn-8.0-osx-x64-v5.1-tgz

或者到我的网盘下载:
https://pan.baidu.com/s/1nuZfOTV

然后解压并进入该目录,执行如下操作:

4)安装 pip:
我们这里使用 Homebrew 作为安装工具

2、安装 Tensorflow:
首先添加环境变量,使用 GPU 版本请使用下面的源:

然后使用如下命令安装 Tensorflow:

在 Mac 上 Tenforflow 的安装路径为:
/usr/local/lib/python3.5/site-packages/tensorflow

3、测试安装:
1)运行命令进入 Python 3.5 环境:

2)执行如下测试脚本:

如果显示出:

则表示运行成功。

4、运行一个示例 Demo(MNIST):
进入 demo 源码目录:

执行 MNIST 用例:

正常情况下运行是很快的,并且会显示下面的内容表示 CUDA 成功跑起来了:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 775M, pci bus id: 0000:01:00.0)
WARNING:tensorflow:From models/image/mnist/convolutional.py:289 in main.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use tf.global_variables_initializer instead.
Initialized!

在误差收敛到一定程度后,训练过程结束。如果没有问题,则表示你已经成功地在 Mac 系统上跑起 tensorflow 的 GPU 版本了!

版本升级:
如果您已经安装了之前的 GPU 版本,想要升级到 1.0 正式版,可以运行如下命令:

常见问题:
1、错误:The directory or its parent directory is not owned by the current user
在安装 Virtualenv 的时候可能会遇到如下错误:
The directory '/Users/valiantliu/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
其原因是使用了 sudo 指令来进行 virtualenv 的安装。请使用如下指令安装 Virtualenv:

2、警告:You are using pip version 7.1.2, however version 8.1.2 is available.
在安装 pip 的时候可能出现如下错误:
You are using pip version 7.1.2, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

只需按照提示升级 pip 组件即可,但需要注意此时可能需要 root 权限:

3、错误:Library not loaded: @rpath/libcudart.7.5.dylib
安装后在测试安装环节使用:

可能会遇到如下错误:
ImportError: dlopen(/Users/valiantliu/tensorflow/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so, 10): Library not loaded: @rpath/libcudart.7.5.dylib
Referenced from: /Users/valiantliu/tensorflow/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so
Reason: image not found

这通常是由于没有安装 CUDA 7.5 驱动所致。请参考安装依赖中3)和4)安装有关依赖。

4、错误:Segmentation fault: 11
安装后在测试安装环节使用:

可能会遇到如下错误:
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.dylib locally
Segmentation fault: 11

5、错误:CUDA driver version is insufficient for CUDA runtime version
在运行 deviceQuery 进行检测时可能出现如下错误:
Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

这种情况通常是因为系统的 CUDA 驱动太老了,请参考 安装依赖 中的步骤3)安装最新版本驱动即可。
安装后再次跑 deviceQuery 用例,可以得到类似如下结果,就表示成功支持 CUDA 了:
Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 775M"
CUDA Driver Version / Runtime Version 7.5 / 7.5
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147024896 bytes)
( 7) Multiprocessors, (192) CUDA Cores/MP: 1344 CUDA Cores
GPU Max Clock rate: 797 MHz (0.80 GHz)
Memory Clock rate: 2500 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce GTX 775M
Result = PASS

6、错误:failed call to cuInit: CUDA_ERROR_NO_DEVICE
在运行 python 示例的时候可能出现如下错误,这通常是由于 cuda 驱动版本低导致的:
E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: liuxiaos-iMac.local
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: liuxiaos-iMac.local
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Invalid argument: expected %d.%d or %d.%d.%d form for driver version; got ""

解决方法是升级 CUDA Driver 驱动。

7、错误:AttributeError: 'GFile' object has no attribute 'Size'
如果运行示例 models/image/mnist/convolutional.py 时出现如下错误:
Traceback (most recent call last):
File "models/image/mnist/convolutional.py", line 326, in
tf.app.run()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "models/image/mnist/convolutional.py", line 132, in main
train_data_filename = maybe_download('train-images-idx3-ubyte.gz')
File "models/image/mnist/convolutional.py", line 72, in maybe_download
size = f.Size()
AttributeError: 'GFile' object has no attribute 'Size'

修改方法是编辑文件 models/image/mnist/convolutional.py
查找:

修改为:

8、错误:Couldn't open CUDA library libcuda.1.dylib
如果出现如下错误:
Couldn't open CUDA library libcuda.1.dylib

这是由于 CUDA 默认安装的库名字和 tensorflow 加载的库名字不一样。我们可以运行如下命令进行链接:

9、错误:PermissionError: [Errno 13] Permission denied: '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/external/__init__.py'

如果安装时遇到如下错误:
Exception:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip/commands/install.py", line 342, in run
prefix=options.prefix_path,
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip/req/req_set.py", line 784, in install
**kwargs
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip/req/req_install.py", line 851, in install
self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip/req/req_install.py", line 1064, in move_wheel_files
isolated=self.isolated,
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip/wheel.py", line 377, in move_wheel_files
clobber(source, dest, False, fixer=fixer, filter=filter)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip/wheel.py", line 323, in clobber
shutil.copyfile(srcfile, destfile)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/shutil.py", line 115, in copyfile
with open(dst, 'wb') as fdst:
PermissionError: [Errno 13] Permission denied: '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/external/__init__.py'

可以考虑使用如下命令安装:

参考文献:
[1] https://gist.github.com/myh1000/3fbb42928d94a083f6eaed28883ef659

[2] https://www.tensorflow.org/get_started/os_setup#mac_os_x_segmentation_fault_when_import_tensorflow

One Comment

Add a Comment

您的电子邮箱地址不会被公开。 必填项已用 * 标注