Update description of HumanData (English and Chinese) (#356)

* Update human_data.md * Update human_data.md * Update human_data.md * fix * minor docs update English & Chinese * minor docs update English & Chinese --------- Co-authored-by: wei-chen-hub <[email protected]>
open-mmlab · Jul 3, 2023 · 0e1f101 · 0e1f101
1 parent 46dc586
commit 0e1f101
Show file tree

Hide file tree

Showing 2 changed files with 112 additions and 17 deletions.
diff --git a/docs/human_data.md b/docs/human_data.md
@@ -6,19 +6,69 @@ HumanData is a subclass of python built-in class dict, containing single-view, i
 
 ### Key/Value definition
 
-#### The keys and values supported by HumanData are described as below.
+#### Paths:
 
+Image path is included, and optionally path of segmentation map and depth image can be included if provided by dataset.
 - image_path: (N, ), list of str, each element is a relative path from the root folder (exclusive) to the image.
+- segmentation (optional): (N, ), list of str, each element is a relative path from the root folder (exclusive) to the segmentation map.
+- depth_path (optional): (N, ), list of str, each element is a relative path from the root folder (exclusive) to the depth image.
+
+#### Keypoints：
+
+Following keys should be included in `HumanData` if applicable. For each dictionary key of keypoints，a corresponding dictionart key of mask should be included，stating which keypoint is valid. For example `keypoints3d_original` should correspond to `keypoints3d_original_mask`.
+
+In `HumanData`, keypoints are stored as `HUMAN_DATA` format, which includes 190 joints. We provide keypoints format (for both 2d and 3d keypoints) convention for many datasets, please see [keypoints_convention](../docs/keypoints_convention.md).
+
+- keypoints3d_smpl / keypoints3d_smplx: (N, 190, 4), numpy array, `smplx / smplx` 3d joints with confidence, joints from each datasets are mapped to HUMAN_DATA joints.
+- keypoints3d_original: (N, 190, 4), numpy array, 3d joints with confidence which provided by the dataset originally, joints from each datasets are mapped to HUMAN_DATA joints.
+- keypoints2d_smpl / keypoints2d_smplx: (N, 190, 3), numpy array, `smpl / smplx` 2d joints with confidence, joints from each datasets are mapped to HUMAN_DATA joints.
+- keypoints2d_original: (N, 190, 3), numpy array, 2d joints with confidence which provided by the dataset originally, joints from each datasets are mapped to HUMAN_DATA joints.
+- (mask sample) keypoints2d_smpl_mask: (190, ), numpy array, mask for which keypoint is valid in `keypoints2d_smpl`. 0 means that the joint in this position cannot be found in original dataset.
+
+#### Bounding Box：
+
+Bounding box of body (smpl), face and hand (smplx), which data type is `[x_min, y_min, width, height, confidence]`，and should not exceed the image boundary.
 - bbox_xywh: (N, 5), numpy array, bounding box with confidence, coordinates of bottom-left point x, y, width w and height h of bbox, score at last.
+- face_bbox_xywh, lhand_bbox_xywh, rhand_bbox_xywh (optional): (N, 5), numpy array, should be included if `smplx` data is provided, and is derived from smplx2d keypoints. Have the same srtucture as above.
+
+#### Human Pose and Shape Parameters：
+
+Normally saved as smpl/smplx.
+- smpl: (1, ), dict, keys are `['body_pose': numpy array, (N, 23, 3), 'global_orient': numpy array, (N, 3), 'betas': numpy array, (N, 10), 'transl': numpy array, (N, 3)]`.
+- smplx: (1, ), dict, keys are `['body_pose': numpy array, (N, 21, 3),'global_orient': numpy array, (N, 3), 'betas': numpy array, (N, 10), 'transl': numpy array, (N, 3), 'left_hand_pose': numpy array, (N, 15, 3), 'right_hand_pose': numpy array, (N, 15, 3), 'expression': numpy array (N, 10), 'leye_pose': numpy array (N, 3), 'reye_pose': (N, 3), 'jaw_pose': numpy array (N, 3)]`.
+
+
+#### Other keys
+
 - config: (), str, the flag name of config for individual dataset.
-- keypoints2d: (N, 190, 3), numpy array, 2d joints of smplx model with confidence, joints from each datasets are mapped to HUMAN_DATA joints.
-- keypoints3d: (N, 190, 4), numpy array, 3d joints of smplx model with confidence. Same as above.
-- smpl: (1, ), dict, keys are ['body_pose': numpy array, (N, 23, 3), 'global_orient': numpy array, (N, 3), 'betas': numpy array, (N, 10), 'transl': numpy array, (N, 3)].
-- smplx: (1, ), dict, keys are ['body_pose': numpy array, (N, 21, 3),'global_orient': numpy array, (N, 3), 'betas': numpy array, (N, 10), 'transl': numpy array, (N, 3), 'left_hand_pose': numpy array, (N, 15, 3), 'right_hand_pose': numpy array, (N, 15, 3), 'expression': numpy array (N, 10), 'leye_pose': numpy array (N, 3), 'reye_pose': (N, 3), 'jaw_pose': numpy array (N, 3)].
 - meta: (1, ), dict, its keys are meta data from dataset like 'gender'.
-- keypoints2d_mask: (190, ), numpy array, mask for which keypoint is valid in keypoints2d. 0 means that the joint in this position cannot be found in original dataset.
-- keypoints3d_mask: (190, ), numpy array, mask for which keypoint is valid in keypoints3d. 0 means that the joint in this position cannot be found in original dataset.
-- misc: (1, ), dict, keys and values are defined by user. The space misc takes(sys.getsizeof(misc)) shall be no more than 6MB.
+- misc: (1, ), dict, keys and values are designed to describe the different settings for each dataset. Can also be defined by user. The space misc takes (sys.getsizeof(misc)) shall be no more than 6MB.
+
+#### Suggestion for WHAT to include in `HumanData['misc']`:
+
+Miscellaneous contains the info of different settings for each dataset, including camaera type, source of keypoints annotation, bounding box etc. Aims to faclitate different usage of data.
+`HumanData['misc']` is a dictionary and its keys are described as following:
+- kps3d_root_aligned： Bool, stating that if keypoints3d is root-aligned，root_alignment is not preferred for HumanData. If this key does not exist, root_aligenment is by default to be `False`.
+- flat_hand_mean：Bool, applicable for smplx data，for most datasets `flat_hand_mean=False`.
+- bbox_source：source of bounding box，`bbox_soruce='keypoints2d_smpl' or 'keypoints2d_smplx' or 'keypoints2d_original'`，describing which type of keypoints are used to derive the bounding box，OR `bbox_source='provide_by_dataset'` shows that bounding box if provided by dataset. (For example, from some detection module rather than keypoints)
+- bbox_body_scale: applicable if bounding box is derived by keypoints，stating the zoom-in scale of bounding scale from smpl/smplx/2d_gt keypoints，we suggest `bbox_body_scale=1.2`.
+- bbox_hand_scale, bbox_face_scale: applicable if bounding box is derived by smplx keypoints，stating the zoom-in scale of bounding scale from smplx/2d_gt keypoints，we suggest `bbox_hand_scale=1.0, bbox_face_scale=1.0`
+- smpl_source / smplx_source: describing the source of smpl/smplx annotations，`'original', 'nerual_annot', 'eft', 'osx_annot', 'cliff_annot'`.
+- cam_param_type: describing the type of camera parameters，`cam_param_type='prespective' or 'predicted_camera' or 'eft_camera'`
+- principal_point, focal_length: (1, 2), numpy array，applicable if camera parameters are same across the whole dataset, which is the case for some synthetic datasets.
+- image_shape: (1, 2), numpy array，applicable if image shape are same across the whole dataset.
+
+#### Suggestion for WHAT to include in `HumanData['meta']`:
+
+- gender: (N, ), list of str, each element represents the gender for an smpl/smplx instance. (key not required if dataset use gender-neutral model)
+- height (width)：(N, )， list of str, each element represents the height (width) of an image, `image_shape=(width, height): (N, 2)` is not suggested as width and height might need to be referenced in different orders. (keys should be in `HumanData['misc']` if image shape are same across the dataset)
+- other keys，applicable if the key value is different，and have some impact on keypoints or smpl/smplx (2d and 3d)，For example, `focal_length` and `principal_point`, focal_length = (N, 2), principal_point = (N, 2)
+
+#### Some other info of HumanData
+
+- All annotations are transformed from world space to opencv camera space, for space transformation we use:
+
+    ```from mmhuman3d.models.body_models.utils import transform_to_camera_frame, batch_transform_to_camera_frame```
 
 #### Key check in HumanData.
 

diff --git a/docs_zh-CN/human_data.md b/docs_zh-CN/human_data.md
@@ -5,21 +5,66 @@
 `HumanData`是Python内置字典的子类，主要用于存放包含人体的单视角图像的信息。它具有通用的基础结构，也兼容具有新特性的客制化数据。
 原生的`HumanData`包含`numpy.ndarray`或其他的Python内置的数据结构，但不包含`torch.Tensor`的数据。可以使用`human_data.to()`将其转换为`torch.Tensor`(支持CPU和GPU)。
 
-### `Key/Value`的定义
+### `Key/Value`的定义：如下是对`HumanData`支持的`Key`和`Value`的描述.
 
-#### 如下是对`HumanData`支持的`Key`和`Value`的描述.
+#### 路径:
 
+通常包含图片路径，如果数据集有提供额外的深度或者分割图，也可以记录下来。
 - image_path: (N, ), 字符串组成的列表, 每一个元素是图像相对于根目录的路径。
+- segmantation_path (可选): (N, ), 字符串组成的列表, 每一个元素是图像分割图相对于根目录的路径。
+- depth_path (可选): (N, ), 字符串组成的列表, 每一个元素是图像深度图相对于根目录的路径。
+
+#### 关键点：
+
+以下关键点keys如果适用，则应包含在HumanData中。任何一个关键点的key，应存在一个mask，表示其中哪些关键点有效。如`keypoints3d_original`应对应`keypoints3d_original_mask`。
+`HumanData` 中的关键点存储格式为`HUMAN_DATA`, 包含190个关键点。MMHuman3d中提供了很多常用关键点格式的转换（2d及3d均支持）, 详见 [keypoints_convention](../docs_zh-CN/keypoints_convention.md).
+- keypoints3d_smpl / keypoints3d_smplx: (N, 190, 4), numpy array, `smplx / smplx`模型的3d关节点与置信度, 每一个数据集的关节点映射到了`HUMAN_DATA`的关节点。
+- keypoints3d_original: (N, 190, 4), numpy array, 由数据集本身提供的3d关节点与置信度, 每一个数据集的关节点映射到了`HUMAN_DATA`的关节点。
+- keypoints2d_smpl / keypoints2d_smplx: (N, 190, 3), numpy array, `smpl / smplx`模型的2d关节点与置信度, 每一个数据集的关节点映射到了`HUMAN_DATA`的关节点。
+- keypoints2d_original: (N, 190, 3), numpy array, 由数据集本身提供的2d关节点与置信度, 每一个数据集的关节点映射到了`HUMAN_DATA`的关节点。
+- （mask示例） keypoints2d_smpl_mask: (190, ), numpy array, 表示`keypoints2d_smpl`中关键点是否有效的掩膜。 0表示该位置的关键点在原始数据集中无法找到。
+
+#### 检测框：
+
+身体（smpl），手脸（smplx）的检测框，标注为`[x_min, y_min, width, height, confidence]`，且不应超出图片。
 - bbox_xywh: (N, 5), numpy array, 边界框的置信度, 边界框左下角点的坐标x和y, 边界框的宽w和高h, 置信度得分放置在最后。
-- config: (), 字符串, 单个数据集的配置的标志。
-- keypoints2d: (N, 190, 3), numpy array, `smplx`模型的2d关节点与置信度, 每一个数据集的关节点映射到了`HUMAN_DATA`的关节点。
-- keypoints3d: (N, 190, 4), numpy array, `smplx`模型的3d关节点与置信度, 每一个数据集的关节点映射到了`HUMAN_DATA`的关节点。
+- face_bbox_xywh, lhand_bbox_xywh, rhand_bbox_xywh（可选）： (N, 5), numpy array, 如果数据标注中含有`smplx`, 则应包括这三个key，由smplx2d关键点得出，格式同上。
+
+#### 人体模型参数：
+
+通常以smpl/smplx格式存储。
 - smpl: (1, ), 字典, `keys` 分别为 ['body_pose': numpy array, (N, 23, 3), 'global_orient': numpy array, (N, 3), 'betas': numpy array, (N, 10), 'transl': numpy array, (N, 3)].
 - smplx: (1, ), 字典, `keys` 分别为 ['body_pose': numpy array, (N, 21, 3),'global_orient': numpy array, (N, 3), 'betas': numpy array, (N, 10), 'transl': numpy array, (N, 3), 'left_hand_pose': numpy array, (N, 15, 3), 'right_hand_pose': numpy array, (N, 15, 3), 'expression': numpy array (N, 10), 'leye_pose': numpy array (N, 3), 'reye_pose': (N, 3), 'jaw_pose': numpy array (N, 3)].
-- meta: (1, ), 字典, `keys` 为数据集中类似性别的元数据。
-- keypoints2d_mask: (190, ), numpy array, 表示`keypoints2d`中关键点是否有效的掩膜。 0表示该位置的关键点在原始数据集中无法找到。
-- keypoints3d_mask: (190, ), numpy array, 表示`keypoints3d`中关键点是否有效的掩膜。 0表示该位置的关键点在原始数据集中无法找到。
-- misc: (1, ), 字典, `keys`和`values`由用户定义。`misc`占用的空间(可以通过`sys.getsizeof(misc)`获取)不能超过6MB。
+
+#### 其它keys
+
+- config: (), 字符串, 单个数据集的配置的标志。
+- meta: (1, ), 字典, `keys`为数据集中的各种元数据。
+- misc: (1, ), 字典, `keys`为数据集中各种独特设定，也可以由用户自定义。`misc`占用的空间（可以通过`sys.getsizeof(misc)`获取）不能超过6MB。
+
+#### `HumanData['misc']`中建议（可能）包含的内容:
+Miscellaneous部分中包含了每个数据集的独特设定，包括相机种类，关键点标注来源，检测框来源，是否包含smpl/smplx标注等等，用于便利数据读取。
+`HumanData['misc']`中包含一个dictionary，建议包含的key如下所示：
+- kps3d_root_aligned： Bool 描述keypoints3d是否经过root align，建议不进行root_alignment，如果不包含这个key，则默认没有进行过root_aligenment
+- flat_hand_mean：Bool 对于smplx标注的数据，应该存在此项，大多数数据集中`flat_hand_mean=False`
+- bbox_source：描述检测框的来源，`bbox_soruce='keypoints2d_smpl' or 'keypoints2d_smplx' or 'keypoints2d_original'`，描述检测框是由哪种关键点得出的，或者`bbox_source='provide_by_dataset'`表示检测框由数据集直接给出（比如用其自带检测器生成而不是由关键点推导得出）
+- bbox_body_scale: 如果检测框由关键点推导得出，则应包含此项，描述由smpl/smplx/2d_gt关键点推导出的身体检测框的放大比例，建议`bbox_body_scale=1.2`
+- bbox_hand_scale, bbox_face_scale: 如果检测框由关键点推导得出，则应包含这两项，描述由smpl/smplx/2d_gt关键点推导出的身体检测框的放大比例，建议`bbox_hand_scale=1.0, bbox_face_scale=1.0`
+- smpl_source / smplx_source: 描述smpl/smplx的来源，`'original', 'nerual_annot', 'eft', 'osx_annot', 'cliff_annot'`, 来描述smpl/smnplx是来源于数据集提供，或者其它标注来源
+- cam_param_type: 描述相机参数的种类，`cam_param_type='prespective' or 'predicted_camera' or 'eft_camera'`
+- principal_point, focal_length: (1, 2), numpy array，如果数据集中相机参数恒定，则应包含这两项，通常适用于生成数据集。
+- image_shape: (1, 2), numpy array，如果数据集中图片大小恒定，则应包含此项。
+
+#### `HumanData['meta']`中建议（可能）包含的内容:
+- gender: (N, )， 字符串组成的列表, 每一个元素是smplx模型的性别（中性则不必标注）
+- height（width）：(N, )， 字符串组成的列表, 每一个元素是图片的高（或宽），这里不推荐使用`image_shape=(width, height): (N, 2)`，因为有时需要按反顺序读取图片格式。（数据集图片分辨率一致则应标注在`HumanData['misc']`中）
+- 其它有标识性的key，若数据集中该key不一致，且会影响keypoints or smpl/smplx，则建议标注，如focal_length与principal_point, focal_length = (N, 2), principal_point = (N, 2)
+
+#### 关于HumanData的一些说明
+
+- 所有数据标注均已从世界坐标转移到opencv相机空间，进行smpl/smplx的相机空间转换可以用
+
+```from mmhuman3d.models.body_models.utils import transform_to_camera_frame, batch_transform_to_camera_frame```
 
 #### 检查`HumanData`中的`key`.