|
- API - Data Pre-Processing
- =========================
-
- .. automodule:: tensorlayerx.utils.prepro
-
- .. autosummary::
-
- affine_rotation_matrix
- affine_horizontal_flip_matrix
- affine_vertical_flip_matrix
- affine_shift_matrix
- affine_shear_matrix
- affine_zoom_matrix
- affine_respective_zoom_matrix
-
- transform_matrix_offset_center
- affine_transform
- affine_transform_cv2
- affine_transform_keypoints
- projective_transform_by_points
-
- rotation
- rotation_multi
- crop
- crop_multi
- flip_axis
- flip_axis_multi
- shift
- shift_multi
-
- shear
- shear_multi
- shear2
- shear_multi2
- swirl
- swirl_multi
- elastic_transform
- elastic_transform_multi
-
- zoom
- respective_zoom
- zoom_multi
-
- brightness
- brightness_multi
-
- illumination
-
- rgb_to_hsv
- hsv_to_rgb
- adjust_hue
-
- imresize
-
- pixel_value_scale
-
- samplewise_norm
- featurewise_norm
-
- channel_shift
- channel_shift_multi
-
- drop
-
- array_to_img
-
- find_contours
- pt2map
- binary_dilation
- dilation
- binary_erosion
- erosion
-
-
- obj_box_coord_rescale
- obj_box_coords_rescale
- obj_box_coord_scale_to_pixelunit
- obj_box_coord_centroid_to_upleft_butright
- obj_box_coord_upleft_butright_to_centroid
- obj_box_coord_centroid_to_upleft
- obj_box_coord_upleft_to_centroid
-
- parse_darknet_ann_str_to_list
- parse_darknet_ann_list_to_cls_box
-
- obj_box_horizontal_flip
- obj_box_imresize
- obj_box_crop
- obj_box_shift
- obj_box_zoom
-
- keypoint_random_crop
- keypoint_resize_random_crop
- keypoint_random_rotate
- keypoint_random_flip
- keypoint_random_resize
- keypoint_random_resize_shortestedge
-
- pad_sequences
- remove_pad_sequences
- process_sequences
- sequences_add_start_id
- sequences_add_end_id
- sequences_add_end_id_after_pad
- sequences_get_mask
-
-
- ..
- Threading
- ------------
- .. autofunction:: threading_data
-
-
- Affine Transform
- ----------------
-
-
- Python can be FAST
- ^^^^^^^^^^^^^^^^^^
-
- Image augmentation is a critical step in deep learning.
- Though TensorFlow has provided ``tf.image``,
- image augmentation often remains as a key bottleneck.
- ``tf.image`` has three limitations:
-
- - Real-world visual tasks such as object detection, segmentation, and pose estimation
- must cope with image meta-data (e.g., coordinates).
- These data are beyond ``tf.image``
- which processes images as tensors.
-
- - ``tf.image`` operators
- breaks the pure Python programing experience (i.e., users have to
- use ``tf.py_func`` in order to call image functions written in Python); however,
- frequent uses of ``tf.py_func`` slow down TensorFlow,
- making users hard to balance flexibility and performance.
-
- - ``tf.image`` API is inflexible. Image operations are
- performed in an order. They are hard to jointly optimize. More importantly,
- sequential image operations can significantly
- reduces the quality of images, thus affecting training accuracy.
-
-
- TensorLayer addresses these limitations by providing a
- high-performance image augmentation API in Python.
- This API bases on affine transformation and ``cv2.wrapAffine``.
- It allows you to combine multiple image processing functions into
- a single matrix operation. This combined operation
- is executed by the fast ``cv2`` library, offering 78x performance improvement (observed in
- `openpose-plus <https://github.com/tensorlayer/openpose-plus>`_ for example).
- The following example illustrates the rationale
- behind this tremendous speed up.
-
-
- Example
- ^^^^^^^
-
- The source code of complete examples can be found \
- `here <https://github.com/tensorlayer/tensorlayer/tree/master/examples/data_process/tutorial_fast_affine_transform.py>`__.
- The following is a typical Python program that applies rotation, shifting, flipping, zooming and shearing to an image,
-
- .. code-block:: python
-
- image = tl.vis.read_image('tiger.jpeg')
-
- xx = tl.prepro.rotation(image, rg=-20, is_random=False)
- xx = tl.prepro.flip_axis(xx, axis=1, is_random=False)
- xx = tl.prepro.shear2(xx, shear=(0., -0.2), is_random=False)
- xx = tl.prepro.zoom(xx, zoom_range=0.8)
- xx = tl.prepro.shift(xx, wrg=-0.1, hrg=0, is_random=False)
-
- tl.vis.save_image(xx, '_result_slow.png')
-
-
- However, by leveraging affine transformation, image operations can be combined into one:
-
- .. code-block:: python
-
- # 1. Create required affine transformation matrices
- M_rotate = tl.prepro.affine_rotation_matrix(angle=20)
- M_flip = tl.prepro.affine_horizontal_flip_matrix(prob=1)
- M_shift = tl.prepro.affine_shift_matrix(wrg=0.1, hrg=0, h=h, w=w)
- M_shear = tl.prepro.affine_shear_matrix(x_shear=0.2, y_shear=0)
- M_zoom = tl.prepro.affine_zoom_matrix(zoom_range=0.8)
-
- # 2. Combine matrices
- # NOTE: operations are applied in a reversed order (i.e., rotation is performed first)
- M_combined = M_shift.dot(M_zoom).dot(M_shear).dot(M_flip).dot(M_rotate)
-
- # 3. Convert the matrix from Cartesian coordinates (the origin in the middle of image)
- # to image coordinates (the origin on the top-left of image)
- transform_matrix = tl.prepro.transform_matrix_offset_center(M_combined, x=w, y=h)
-
- # 4. Transform the image using a single operation
- result = tl.prepro.affine_transform_cv2(image, transform_matrix) # 76 times faster
-
- tl.vis.save_image(result, '_result_fast.png')
-
-
- The following figure illustrates the rational behind combined affine transformation.
-
- .. image:: ../images/affine_transform_why.jpg
- :width: 100 %
- :align: center
-
-
- Using combined affine transformation has two key benefits. First, it allows \
- you to leverage a pure Python API to achieve orders of magnitudes of speed up in image augmentation,
- and thus prevent data pre-processing from becoming a bottleneck in training. \
- Second, performing sequential image transformation requires multiple image interpolations. \
- This produces low-quality input images. In contrast, a combined transformation performs the \
- interpolation only once, and thus
- preserve the content in an image. The following figure illustrates these two benefits:
-
- .. image:: ../images/affine_transform_comparison.jpg
- :width: 100 %
- :align: center
-
- The major reason for combined affine transformation being fast is because it has lower computational complexity.
- Assume we have ``k`` affine transformations ``T1, ..., Tk``, where ``Ti`` can be represented by 3x3 matrixes.
- The sequential transformation can be represented as ``y = Tk (... T1(x))``,
- and the time complexity is ``O(k N)`` where ``N`` is the cost of applying one transformation to image ``x``.
- ``N`` is linear to the size of ``x``.
- For the combined transformation ``y = (Tk ... T1) (x)``
- the time complexity is ``O(27(k - 1) + N) = max{O(27k), O(N)} = O(N)`` (assuming 27k << N) where 27 = 3^3 is the cost for combining two transformations.
-
-
- Get rotation matrix
- ^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: affine_rotation_matrix
-
- Get horizontal flipping matrix
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: affine_horizontal_flip_matrix
-
- Get vertical flipping matrix
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: affine_vertical_flip_matrix
-
- Get shifting matrix
- ^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: affine_shift_matrix
-
- Get shearing matrix
- ^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: affine_shear_matrix
-
- Get zooming matrix
- ^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: affine_zoom_matrix
-
- Get respective zooming matrix
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: affine_respective_zoom_matrix
-
- Cartesian to image coordinates
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: transform_matrix_offset_center
-
- ..
- Apply image transform
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: affine_transform
-
- Apply image transform
- ^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: affine_transform_cv2
-
- Apply keypoint transform
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: affine_transform_keypoints
-
-
- Images
- -----------
-
- Projective transform by points
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: projective_transform_by_points
-
- Rotation
- ^^^^^^^^^
- .. autofunction:: rotation
- .. autofunction:: rotation_multi
-
- Crop
- ^^^^^^^^^
- .. autofunction:: crop
- .. autofunction:: crop_multi
-
- Flip
- ^^^^^^^^^
- .. autofunction:: flip_axis
- .. autofunction:: flip_axis_multi
-
- Shift
- ^^^^^^^^^
- .. autofunction:: shift
- .. autofunction:: shift_multi
-
- Shear
- ^^^^^^^^^
- .. autofunction:: shear
- .. autofunction:: shear_multi
-
- Shear V2
- ^^^^^^^^^^^
- .. autofunction:: shear2
- .. autofunction:: shear_multi2
-
- Swirl
- ^^^^^^^^^
- .. autofunction:: swirl
- .. autofunction:: swirl_multi
-
- Elastic transform
- ^^^^^^^^^^^^^^^^^^
- .. autofunction:: elastic_transform
- .. autofunction:: elastic_transform_multi
-
- Zoom
- ^^^^^^^^^
- .. autofunction:: zoom
- .. autofunction:: zoom_multi
-
- Respective Zoom
- ^^^^^^^^^^^^^^^^^
- .. autofunction:: respective_zoom
-
- Brightness
- ^^^^^^^^^^^^
- .. autofunction:: brightness
- .. autofunction:: brightness_multi
-
- Brightness, contrast and saturation
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: illumination
-
- RGB to HSV
- ^^^^^^^^^^^^^^
- .. autofunction:: rgb_to_hsv
-
- HSV to RGB
- ^^^^^^^^^^^^^^
- .. autofunction:: hsv_to_rgb
-
- Adjust Hue
- ^^^^^^^^^^^^^^
- .. autofunction:: adjust_hue
-
- Resize
- ^^^^^^^^^^^^
- .. autofunction:: imresize
-
- Pixel value scale
- ^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: pixel_value_scale
-
- Normalization
- ^^^^^^^^^^^^^^^
- .. autofunction:: samplewise_norm
- .. autofunction:: featurewise_norm
-
- Channel shift
- ^^^^^^^^^^^^^^
- .. autofunction:: channel_shift
- .. autofunction:: channel_shift_multi
-
- Noise
- ^^^^^^^^^^^^^^
- .. autofunction:: drop
-
- Numpy and PIL
- ^^^^^^^^^^^^^^
- .. autofunction:: array_to_img
-
- Find contours
- ^^^^^^^^^^^^^^
- .. autofunction:: find_contours
-
- Points to Image
- ^^^^^^^^^^^^^^^^^
- .. autofunction:: pt2map
-
- Binary dilation
- ^^^^^^^^^^^^^^^^^
- .. autofunction:: binary_dilation
-
- Greyscale dilation
- ^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: dilation
-
- Binary erosion
- ^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: binary_erosion
-
- Greyscale erosion
- ^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: erosion
-
-
-
- Object detection
- -------------------
-
- Tutorial for Image Aug
- ^^^^^^^^^^^^^^^^^^^^^^^
-
- Hi, here is an example for image augmentation on VOC dataset.
-
- .. code-block:: python
-
- import tensorlayer as tl
-
- ## download VOC 2012 dataset
- imgs_file_list, _, _, _, classes, _, _,\
- _, objs_info_list, _ = tl.files.load_voc_dataset(dataset="2012")
-
- ## parse annotation and convert it into list format
- ann_list = []
- for info in objs_info_list:
- ann = tl.prepro.parse_darknet_ann_str_to_list(info)
- c, b = tl.prepro.parse_darknet_ann_list_to_cls_box(ann)
- ann_list.append([c, b])
-
- # read and save one image
- idx = 2 # you can select your own image
- image = tl.vis.read_image(imgs_file_list[idx])
- tl.vis.draw_boxes_and_labels_to_image(image, ann_list[idx][0],
- ann_list[idx][1], [], classes, True, save_name='_im_original.png')
-
- # left right flip
- im_flip, coords = tl.prepro.obj_box_horizontal_flip(image,
- ann_list[idx][1], is_rescale=True, is_center=True, is_random=False)
- tl.vis.draw_boxes_and_labels_to_image(im_flip, ann_list[idx][0],
- coords, [], classes, True, save_name='_im_flip.png')
-
- # resize
- im_resize, coords = tl.prepro.obj_box_imresize(image,
- coords=ann_list[idx][1], size=[300, 200], is_rescale=True)
- tl.vis.draw_boxes_and_labels_to_image(im_resize, ann_list[idx][0],
- coords, [], classes, True, save_name='_im_resize.png')
-
- # crop
- im_crop, clas, coords = tl.prepro.obj_box_crop(image, ann_list[idx][0],
- ann_list[idx][1], wrg=200, hrg=200,
- is_rescale=True, is_center=True, is_random=False)
- tl.vis.draw_boxes_and_labels_to_image(im_crop, clas, coords, [],
- classes, True, save_name='_im_crop.png')
-
- # shift
- im_shfit, clas, coords = tl.prepro.obj_box_shift(image, ann_list[idx][0],
- ann_list[idx][1], wrg=0.1, hrg=0.1,
- is_rescale=True, is_center=True, is_random=False)
- tl.vis.draw_boxes_and_labels_to_image(im_shfit, clas, coords, [],
- classes, True, save_name='_im_shift.png')
-
- # zoom
- im_zoom, clas, coords = tl.prepro.obj_box_zoom(image, ann_list[idx][0],
- ann_list[idx][1], zoom_range=(1.3, 0.7),
- is_rescale=True, is_center=True, is_random=False)
- tl.vis.draw_boxes_and_labels_to_image(im_zoom, clas, coords, [],
- classes, True, save_name='_im_zoom.png')
-
-
- In practice, you may want to use threading method to process a batch of images as follows.
-
- .. code-block:: python
-
- import tensorlayer as tl
- import random
-
- batch_size = 64
- im_size = [416, 416]
- n_data = len(imgs_file_list)
- jitter = 0.2
- def _data_pre_aug_fn(data):
- im, ann = data
- clas, coords = ann
- ## change image brightness, contrast and saturation randomly
- im = tl.prepro.illumination(im, gamma=(0.5, 1.5),
- contrast=(0.5, 1.5), saturation=(0.5, 1.5), is_random=True)
- ## flip randomly
- im, coords = tl.prepro.obj_box_horizontal_flip(im, coords,
- is_rescale=True, is_center=True, is_random=True)
- ## randomly resize and crop image, it can have same effect as random zoom
- tmp0 = random.randint(1, int(im_size[0]*jitter))
- tmp1 = random.randint(1, int(im_size[1]*jitter))
- im, coords = tl.prepro.obj_box_imresize(im, coords,
- [im_size[0]+tmp0, im_size[1]+tmp1], is_rescale=True,
- interp='bicubic')
- im, clas, coords = tl.prepro.obj_box_crop(im, clas, coords,
- wrg=im_size[1], hrg=im_size[0], is_rescale=True,
- is_center=True, is_random=True)
- ## rescale value from [0, 255] to [-1, 1] (optional)
- im = im / 127.5 - 1
- return im, [clas, coords]
-
- # randomly read a batch of image and the corresponding annotations
- idexs = tl.utils.get_random_int(min=0, max=n_data-1, number=batch_size)
- b_im_path = [imgs_file_list[i] for i in idexs]
- b_images = tl.prepro.threading_data(b_im_path, fn=tl.vis.read_image)
- b_ann = [ann_list[i] for i in idexs]
-
- # threading process
- data = tl.prepro.threading_data([_ for _ in zip(b_images, b_ann)],
- _data_pre_aug_fn)
- b_images2 = [d[0] for d in data]
- b_ann = [d[1] for d in data]
-
- # save all images
- for i in range(len(b_images)):
- tl.vis.draw_boxes_and_labels_to_image(b_images[i],
- ann_list[idexs[i]][0], ann_list[idexs[i]][1], [],
- classes, True, save_name='_bbox_vis_%d_original.png' % i)
- tl.vis.draw_boxes_and_labels_to_image((b_images2[i]+1)*127.5,
- b_ann[i][0], b_ann[i][1], [], classes, True,
- save_name='_bbox_vis_%d.png' % i)
-
- Image Aug with TF Dataset API
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
- - Example code for VOC `here <https://github.com/tensorlayer/tensorlayer/blob/master/example/tutorial_tf_dataset_voc.py>`__.
-
- Coordinate pixel unit to percentage
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: obj_box_coord_rescale
-
- Coordinates pixel unit to percentage
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: obj_box_coords_rescale
-
- Coordinate percentage to pixel unit
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: obj_box_coord_scale_to_pixelunit
-
- Coordinate [x_center, x_center, w, h] to up-left button-right
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: obj_box_coord_centroid_to_upleft_butright
-
- Coordinate up-left button-right to [x_center, x_center, w, h]
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: obj_box_coord_upleft_butright_to_centroid
-
- Coordinate [x_center, x_center, w, h] to up-left-width-high
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: obj_box_coord_centroid_to_upleft
-
- Coordinate up-left-width-high to [x_center, x_center, w, h]
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: obj_box_coord_upleft_to_centroid
-
- Darknet format string to list
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: parse_darknet_ann_str_to_list
-
- Darknet format split class and coordinate
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: parse_darknet_ann_list_to_cls_box
-
- Image Aug - Flip
- ^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: obj_box_horizontal_flip
-
- Image Aug - Resize
- ^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: obj_box_imresize
-
- Image Aug - Crop
- ^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: obj_box_crop
-
- Image Aug - Shift
- ^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: obj_box_shift
-
- Image Aug - Zoom
- ^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: obj_box_zoom
-
- Keypoints
- ------------
-
- Image Aug - Crop
- ^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: keypoint_random_crop
-
- Image Aug - Resize then Crop
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: keypoint_resize_random_crop
-
- Image Aug - Rotate
- ^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: keypoint_random_rotate
-
- Image Aug - Flip
- ^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: keypoint_random_flip
-
- Image Aug - Resize
- ^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: keypoint_random_resize
-
- Image Aug - Resize Shortest Edge
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: keypoint_random_resize_shortestedge
-
-
- Sequence
- ---------
-
- More related functions can be found in ``tensorlayer.nlp``.
-
- Padding
- ^^^^^^^^^
- .. autofunction:: pad_sequences
-
- Remove Padding
- ^^^^^^^^^^^^^^^^^
- .. autofunction:: remove_pad_sequences
-
-
- Process
- ^^^^^^^^^
- .. autofunction:: process_sequences
-
- Add Start ID
- ^^^^^^^^^^^^^^^
- .. autofunction:: sequences_add_start_id
-
-
- Add End ID
- ^^^^^^^^^^^^^^^
- .. autofunction:: sequences_add_end_id
-
- Add End ID after pad
- ^^^^^^^^^^^^^^^^^^^^^^^
- .. autofunction:: sequences_add_end_id_after_pad
-
- Get Mask
- ^^^^^^^^^
- .. autofunction:: sequences_get_mask
|