From 1a9e58893232a5dcbec58d3dec986b9985d8d64e Mon Sep 17 00:00:00 2001
From: Ben Sander <ben.sander@amd.com>
Date: Mon, 28 Aug 2017 15:44:38 -0500
Subject: [PATCH] Update docs.

- Review unsupported comments and refresh.
- Review experimental tag in API.
---
 docs/markdown/hip_faq.md                 | 34 +++++++++++-------------
 docs/markdown/hip_kernel_language.md     |  4 +--
 include/hip/hcc_detail/hip_runtime_api.h | 26 ++++++++----------
 3 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/docs/markdown/hip_faq.md b/docs/markdown/hip_faq.md
index b92ae512c5..c311aae320 100644
--- a/docs/markdown/hip_faq.md
+++ b/docs/markdown/hip_faq.md
@@ -48,12 +48,11 @@ The HIP API documentation describes each API and its limitations, if any, compar
 
 ### What is not supported?
 #### Runtime/Driver API features
-At a high-level, the following features are not supported:
-- Textures 
+)t a high-level, the following features are not supported:
+- Textures (partial support available)
 - Dynamic parallelism (CUDA 5.0)
 - Managed memory (CUDA 6.5)
 - Graphics interoperability with OpenGL or Direct3D
-- CUDA Driver API
 - CUDA IPC Functions (Under Development)
 - CUDA array, mipmappedArray and pitched memory
 - Queue priority controls
@@ -61,17 +60,12 @@ At a high-level, the following features are not supported:
 See the [API Support Table](CUDA_Runtime_API_functions_supported_by_HIP.md) for more detailed information.
 
 #### Kernel language features
-- Device-side dynamic memory allocations (malloc, free, new, delete) (CUDA 4.0)
+- C++-style device-side dynamic memory allocations (free, new, delete) (CUDA 4.0)
 - Virtual functions, indirect functions and try/catch (CUDA 4.0)
 - `__prof_trigger` 
 - PTX assembly (CUDA 4.0).  HCC supports inline GCN assembly.
 - Several kernel features are under development.  See the [HIP Kernel Language](hip_kernel_language.md) for more information.  These include:
   - printf
-  - assert
-  - `__restrict__`
-  - `__threadfence*_`, `__syncthreads*`
-  - Unbounded loop unroll
-
 
 
 ### Is HIP a drop-in replacement for CUDA?
@@ -100,18 +94,20 @@ However, we can provide a rough summary of the features included in each CUDA SD
     - Per-thread-streams (under development)
     - C++11 (HCC supports all of C++11, all of C++14 and some C++17 features)
 - CUDA 7.5
-    - float16
+    - float16 (supported)
 - CUDA 8.0
-    - TBD.
+    - Page Migration including cudaMemAdvise, cudaMemPrefetch, other cudaMem* APIs(not supported)
 
-### What libraries does HIP support?
-HIP includes growing support for the 4 key math libraries using hcBlas, hcFft, hcrng and hcsparse.
-These offer pointer-based memory interfaces (as opposed to opaque buffers) and can be easily interfaced with other HCC applications.  Developers should use conditional compilation if portability to nvcc systems is desired - using calls to cu* routines on one path and hc* routines on the other.  
 
-- [hcblas](https://bitbucket.org/multicoreware/hcblas)
-- [hcfft](https://bitbucket.org/multicoreware/hcfft)
-- [hcsparse](https://bitbucket.org/multicoreware/hcsparse)
-- [hcrng](https://bitbucket.org/multicoreware/hcrng)
+### What libraries does HIP support?
+HIP includes growing support for the 4 key math libraries using hcBlas, hcFft, hcrng and hcsparse, as well as MIOpen for machine intelligence applications.
+These offer pointer-based memory interfaces (as opposed to opaque buffers) and can be easily interfaced with other HIP applications.  
+The hip interfaces support both ROCm and CUDA paths, with familiar library interfaces.
+
+- [hipBlas](https://github.com/ROCmSoftwarePlatform/hipBLAS), which utilizes [rocBlas](https://github.com/ROCmSoftwarePlatform/rocBLAS).
+- [hipfft](https://github.com/ROCmSoftwarePlatform/hcFFT)
+- [hipsparse](https://github.com/ROCmSoftwarePlatform/hcSPARSE)
+- [hiprng](https://github.com/ROCmSoftwarePlatform/hcrng)
    
 Additionally, some of the cublas routines are automatically converted to hipblas equivalents by the hipify-clang tool.  These APIs use cublas or hcblas depending on the platform, and replace the need
 to use conditional compilation. 
@@ -218,7 +214,7 @@ If platform portability is important, use #ifdef __HIP_PLATFORM_NVCC__ to guard
 ### On HCC, can I use HC functionality with HIP?
 Yes.  
 The code can include hc.hpp and use HC functions inside the kernel.  A typical use-case is to use AMD-specific hardware features such as the permute, swizzle, or DPP operations.
-The "-stdlib=libc++" must be passed to hipcc in order to compile hc.hpp.  See the 'bit_extract' sample for an example. 
+See the 'bit_extract' sample for an example. 
 
 Also these functions can be used to extract HCC accelerator and accelerator_view structures from the HIP deviceId and hipStream_t:
 hipHccGetAccelerator(int deviceId, hc::accelerator *acc);
diff --git a/docs/markdown/hip_kernel_language.md b/docs/markdown/hip_kernel_language.md
index cfa5d0f871..a4f4d5d47f 100644
--- a/docs/markdown/hip_kernel_language.md
+++ b/docs/markdown/hip_kernel_language.md
@@ -167,7 +167,7 @@ The `__shared__` keyword is supported.
 Managed memory, including the `__managed__` keyword, are not supported in HIP.
 
 ### `__restrict__`
-The `__restrict__` keyword tells the compiler that the associated memory pointer will not alias with any other pointer in the kernel or function.  This feature can help the compiler generate better code. In most cases, all pointer arguments must use this keyword to realize the benefit. hcc support for the `__restrict__` qualifier on kernel arguments is under development.  
+The `__restrict__` keyword tells the compiler that the associated memory pointer will not alias with any other pointer in the kernel or function.  This feature can help the compiler generate better code. In most cases, all pointer arguments must use this keyword to realize the benefit. 
 
 
 ## Built-In Variables
@@ -603,6 +603,7 @@ The Cuda `__prof_trigger()` instruction is not supported.
 ## Assert
 
 The assert function is under development.
+HIP does support an "abort" call which will terminate the process execution from inside the kernel.
 
 ## Printf
 
@@ -690,7 +691,6 @@ for (int i=0; i<16; i++) ...
 ```
 
 
-Unbounded loop unroll is under development on HCC compiler.
 ```
 #pragma unroll /* hint to compiler to completely unroll next loop. */
 for (int i=0; i<16; i++) ...
diff --git a/include/hip/hcc_detail/hip_runtime_api.h b/include/hip/hcc_detail/hip_runtime_api.h
index e28dbab82c..03be587b0d 100644
--- a/include/hip/hcc_detail/hip_runtime_api.h
+++ b/include/hip/hcc_detail/hip_runtime_api.h
@@ -171,7 +171,7 @@ typedef enum hipJitOption {
 
 
 /**
- * @warning On AMD devices and recent Nvidia devices, these hints and controls are ignored.
+ * @warning On AMD devices and some Nvidia devices, these hints and controls are ignored.
  */
 typedef enum hipFuncCache_t {
     hipFuncCachePreferNone, ///< no preference for shared memory or L1 (default)
@@ -182,7 +182,7 @@ typedef enum hipFuncCache_t {
 
 
 /**
- * @warning On AMD devices and recent Nvidia devices, these hints and controls are ignored.
+ * @warning On AMD devices and some Nvidia devices, these hints and controls are ignored.
  */
 typedef enum hipSharedMemConfig {
     hipSharedMemBankSizeDefault,   ///< The compiler selects a device-specific value for the banking.
@@ -364,7 +364,7 @@ hipError_t hipGetDeviceProperties(hipDeviceProp_t* prop, int deviceId);
  * @param [in] cacheConfig
  *
  * @returns #hipSuccess, #hipErrorInitializationError
- * Note: AMD devices and recent Nvidia GPUS do not support reconfigurable cache.  This hint is ignored on those architectures.
+ * Note: AMD devices and some Nvidia GPUS do not support reconfigurable cache.  This hint is ignored on those architectures.
  *
  */
 hipError_t hipDeviceSetCacheConfig ( hipFuncCache_t cacheConfig );
@@ -376,7 +376,7 @@ hipError_t hipDeviceSetCacheConfig ( hipFuncCache_t cacheConfig );
  * @param [in] cacheConfig
  *
  * @returns #hipSuccess, #hipErrorInitializationError
- * Note: AMD devices and recent Nvidia GPUS do not support reconfigurable cache.  This hint is ignored on those architectures.
+ * Note: AMD devices and some Nvidia GPUS do not support reconfigurable cache.  This hint is ignored on those architectures.
  *
  */
 hipError_t hipDeviceGetCacheConfig ( hipFuncCache_t *cacheConfig );
@@ -400,7 +400,7 @@ hipError_t hipDeviceGetLimit(size_t *pValue, enum hipLimit_t limit);
  * @param [in] config;
  *
  * @returns #hipSuccess, #hipErrorInitializationError
- * Note: AMD devices and recent Nvidia GPUS do not support reconfigurable cache.  This hint is ignored on those architectures.
+ * Note: AMD devices and some Nvidia GPUS do not support reconfigurable cache.  This hint is ignored on those architectures.
  *
  */
 hipError_t hipFuncSetCacheConfig (const void* func, hipFuncCache_t config );
@@ -412,7 +412,7 @@ hipError_t hipFuncSetCacheConfig (const void* func, hipFuncCache_t config );
  *
  * @returns #hipSuccess, #hipErrorInvalidValue, #hipErrorInitializationError
  *
- * Note: AMD devices and recent Nvidia GPUS do not support shared cache banking, and the hint is ignored on those architectures.
+ * Note: AMD devices and some Nvidia GPUS do not support shared cache banking, and the hint is ignored on those architectures.
  *
  */
 hipError_t hipDeviceGetSharedMemConfig ( hipSharedMemConfig * pConfig );
@@ -425,7 +425,7 @@ hipError_t hipDeviceGetSharedMemConfig ( hipSharedMemConfig * pConfig );
  *
  * @returns #hipSuccess, #hipErrorInvalidValue, #hipErrorInitializationError
  *
- * Note: AMD devices and recent Nvidia GPUS do not support shared cache banking, and the hint is ignored on those architectures.
+ * Note: AMD devices and some Nvidia GPUS do not support shared cache banking, and the hint is ignored on those architectures.
  *
  */
 hipError_t hipDeviceSetSharedMemConfig ( hipSharedMemConfig config );
@@ -1449,7 +1449,6 @@ hipError_t hipMemcpy3D(const struct hipMemcpy3DParms *p);
  *
  * @returns #hipSuccess,
  * @returns #hipErrorInvalidDevice if deviceId or peerDeviceId are not valid devices
- * @warning PeerToPeer support is experimental.
  */
 hipError_t hipDeviceCanAccessPeer (int* canAccessPeer, int deviceId, int peerDeviceId);
 
@@ -1467,7 +1466,6 @@ hipError_t hipDeviceCanAccessPeer (int* canAccessPeer, int deviceId, int peerDev
  *
  * Returns #hipSuccess, #hipErrorInvalidDevice, #hipErrorInvalidValue,
  * @returns #hipErrorPeerAccessAlreadyEnabled if peer access is already enabled for this device.
- * @warning PeerToPeer support is experimental.
  */
 hipError_t  hipDeviceEnablePeerAccess (int  peerDeviceId, unsigned int flags);
 
@@ -1480,7 +1478,6 @@ hipError_t  hipDeviceEnablePeerAccess (int  peerDeviceId, unsigned int flags);
  * @param [in] peerDeviceId
  *
  * @returns #hipSuccess, #hipErrorPeerAccessNotEnabled
- * @warning PeerToPeer support is experimental.
  */
 hipError_t  hipDeviceDisablePeerAccess (int peerDeviceId);
 
@@ -1512,7 +1509,6 @@ hipError_t hipMemGetAddressRange ( hipDeviceptr_t* pbase, size_t* psize, hipDevi
  * @param [in] sizeBytes - Size of memory copy in bytes
  *
  * @returns #hipSuccess, #hipErrorInvalidValue, #hipErrorInvalidDevice
- * @warning PeerToPeer support is experimental.
  */
 hipError_t hipMemcpyPeer (void* dst, int dstDeviceId, const void* src, int srcDeviceId, size_t sizeBytes);
 
@@ -1671,7 +1667,7 @@ hipError_t hipCtxGetApiVersion (hipCtx_t ctx,int *apiVersion);
  *
  * @return #hipSuccess
  *
- * @warning AMD devices and recent Nvidia GPUS do not support reconfigurable cache.  This hint is ignored on those architectures.
+ * @warning AMD devices and some Nvidia GPUS do not support reconfigurable cache.  This hint is ignored on those architectures.
  *
  * @see hipCtxCreate, hipCtxDestroy, hipCtxGetFlags, hipCtxPopCurrent, hipCtxGetCurrent, hipCtxSetCurrent, hipCtxPushCurrent, hipCtxSetCacheConfig, hipCtxSynchronize, hipCtxGetDevice
  */
@@ -1684,7 +1680,7 @@ hipError_t hipCtxGetCacheConfig ( hipFuncCache_t *cacheConfig );
  *
  * @return #hipSuccess
  *
- * @warning AMD devices and recent Nvidia GPUS do not support reconfigurable cache.  This hint is ignored on those architectures.
+ * @warning AMD devices and some Nvidia GPUS do not support reconfigurable cache.  This hint is ignored on those architectures.
  *
  * @see hipCtxCreate, hipCtxDestroy, hipCtxGetFlags, hipCtxPopCurrent, hipCtxGetCurrent, hipCtxSetCurrent, hipCtxPushCurrent, hipCtxSetCacheConfig, hipCtxSynchronize, hipCtxGetDevice
  */
@@ -1697,7 +1693,7 @@ hipError_t hipCtxSetCacheConfig ( hipFuncCache_t cacheConfig );
  *
  * @return #hipSuccess
  *
- * @warning AMD devices and recent Nvidia GPUS do not support shared cache banking, and the hint is ignored on those architectures.
+ * @warning AMD devices and some Nvidia GPUS do not support shared cache banking, and the hint is ignored on those architectures.
  *
  * @see hipCtxCreate, hipCtxDestroy, hipCtxGetFlags, hipCtxPopCurrent, hipCtxGetCurrent, hipCtxSetCurrent, hipCtxPushCurrent, hipCtxSetCacheConfig, hipCtxSynchronize, hipCtxGetDevice
  */
@@ -1710,7 +1706,7 @@ hipError_t hipCtxSetSharedMemConfig ( hipSharedMemConfig config );
  *
  * @return #hipSuccess
  *
- * @warning AMD devices and recent Nvidia GPUS do not support shared cache banking, and the hint is ignored on those architectures.
+ * @warning AMD devices and some Nvidia GPUS do not support shared cache banking, and the hint is ignored on those architectures.
  *
  * @see hipCtxCreate, hipCtxDestroy, hipCtxGetFlags, hipCtxPopCurrent, hipCtxGetCurrent, hipCtxSetCurrent, hipCtxPushCurrent, hipCtxSetCacheConfig, hipCtxSynchronize, hipCtxGetDevice
  */