Augmentations (albumentations.augmentations)

Transforms

class albumentations.augmentations.transforms.Blur(blur_limit=7, always_apply=False, p=0.5)[source]

Blur the input image using a random-sized kernel.

Parameters:
  • blur_limit (int) – maximum kernel size for blurring the input image. Default: 7.
  • p (float) – probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
class albumentations.augmentations.transforms.VerticalFlip(always_apply=False, p=0.5)[source]

Flip the input vertically around the x-axis.

Parameters:p (float) – probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes
Image types:
uint8, float32
class albumentations.augmentations.transforms.HorizontalFlip(always_apply=False, p=0.5)[source]

Flip the input horizontally around the y-axis.

Parameters:p (float) – probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes
Image types:
uint8, float32
class albumentations.augmentations.transforms.Flip(always_apply=False, p=0.5)[source]

Flip the input either horizontally, vertically or both horizontally and vertically.

Parameters:p (float) – probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes
Image types:
uint8, float32
apply(img, d=0, **params)[source]

Args: d (int): code that specifies how to flip the input. 0 for vertical flipping, 1 for horizontal flipping,

-1 for both vertical and horizontal flipping (which is also could be seen as rotating the input by 180 degrees).
class albumentations.augmentations.transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0, always_apply=False, p=1.0)[source]

Divide pixel values by 255 = 2**8 - 1, subtract mean per channel and divide by std per channel.

Parameters:
  • mean (float, float, float) – mean values
  • std (float, float, float) – std values
  • max_pixel_value (float) – maximum possible pixel value
Targets:
image
Image types:
uint8, float32
class albumentations.augmentations.transforms.Transpose(always_apply=False, p=0.5)[source]

Transpose the input by swapping rows and columns.

Parameters:p (float) – probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes
Image types:
uint8, float32
class albumentations.augmentations.transforms.RandomCrop(height, width, always_apply=False, p=1.0)[source]

Crop a random part of the input.

Parameters:
  • height (int) – height of the crop.
  • width (int) – width of the crop.
  • p (float) – probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes
Image types:
uint8, float32
class albumentations.augmentations.transforms.RandomGamma(gamma_limit=(80, 120), always_apply=False, p=0.5)[source]
Targets:
image
Image types:
uint8, float32
class albumentations.augmentations.transforms.RandomRotate90(always_apply=False, p=0.5)[source]

Randomly rotate the input by 90 degrees zero or more times.

Parameters:p (float) – probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes
Image types:
uint8, float32
apply(img, factor=0, **params)[source]
Parameters:factor (int) – number of times the input will be rotated by 90 degrees.
class albumentations.augmentations.transforms.Rotate(limit=90, interpolation=1, border_mode=4, always_apply=False, p=0.5)[source]

Rotate the input by an angle selected randomly from the uniform distribution.

Parameters:
  • limit ((int, int) or int) – range from which a random angle is picked. If limit is a single int an angle is picked from (-limit, limit). Default: 90
  • interpolation (OpenCV flag) – flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
  • border_mode (OpenCV flag) – flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101
  • p (float) – probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes
Image types:
uint8, float32
class albumentations.augmentations.transforms.ShiftScaleRotate(shift_limit=0.0625, scale_limit=0.1, rotate_limit=45, interpolation=1, border_mode=4, always_apply=False, p=0.5)[source]

Randomly apply affine transforms: translate, scale and rotate the input.

Parameters:
  • shift_limit ((float, float) or float) – shift factor range for both height and width. If shift_limit is a single float value, the range will be (-shift_limit, shift_limit). Absolute values for lower and upper bounds should lie in range [0, 1]. Default: 0.0625.
  • scale_limit ((float, float) or float) – scaling factor range. If scale_limit is a single float value, the range will be (-scale_limit, scale_limit). Default: 0.1.
  • rotate_limit ((int, int) or int) – rotation range. If rotate_limit is a single int value, the range will be (-rotate_limit, rotate_limit). Default: 45.
  • interpolation (OpenCV flag) – flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
  • border_mode (OpenCV flag) – flag that is used to specify the pixel extrapolation method. Should be one of: cv2.BORDER_CONSTANT, cv2.BORDER_REPLICATE, cv2.BORDER_REFLECT, cv2.BORDER_WRAP, cv2.BORDER_REFLECT_101. Default: cv2.BORDER_REFLECT_101
  • p (float) – probability of applying the transform. Default: 0.5.
Targets:
image, mask
Image types:
uint8, float32
class albumentations.augmentations.transforms.CenterCrop(height, width, always_apply=False, p=1.0)[source]

Crop the central part of the input.

Parameters:
  • height (int) – height of the crop.
  • width (int) – width of the crop.
  • p (float) – probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes
Image types:
uint8, float32

Note

It is recommended to use uint8 images as input. Otherwise the operation will require internal conversion float32 -> uint8 -> float32 that causes worse performance.

class albumentations.augmentations.transforms.OpticalDistortion(distort_limit=0.05, shift_limit=0.05, interpolation=1, border_mode=4, always_apply=False, p=0.5)[source]
Targets:
image, mask
Image types:
uint8, float32
class albumentations.augmentations.transforms.GridDistortion(num_steps=5, distort_limit=0.3, interpolation=1, border_mode=4, always_apply=False, p=0.5)[source]
Targets:
image, mask
Image types:
uint8, float32
class albumentations.augmentations.transforms.ElasticTransform(alpha=1, sigma=50, alpha_affine=50, interpolation=1, border_mode=4, always_apply=False, approximate=False, p=0.5)[source]

Elastic deformation of images as described in [Simard2003] (with modifications). Based on https://gist.github.com/erniejunior/601cdf56d2b424757de5

[Simard2003]Simard, Steinkraus and Platt, “Best Practices for Convolutional Neural Networks applied to Visual Document Analysis”, in Proc. of the International Conference on Document Analysis and Recognition, 2003.
Parameters:approximate (boolean) – Whether to smooth displacement map with fixed kernel size. Enabling this option gives ~2X speedup on large images.
Targets:
image, mask
Image types:
uint8, float32
class albumentations.augmentations.transforms.HueSaturationValue(hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, always_apply=False, p=0.5)[source]

Randomly change hue, saturation and value of the input image.

Parameters:
  • hue_shift_limit ((int, int) or int) – range for changing hue. If hue_shift_limit is a single int, the range will be (-hue_shift_limit, hue_shift_limit). Default: 20.
  • sat_shift_limit ((int, int) or int) – range for changing saturation. If sat_shift_limit is a single int, the range will be (-sat_shift_limit, sat_shift_limit). Default: 30.
  • val_shift_limit ((int, int) or int) – range for changing value. If val_shift_limit is a single int, the range will be (-val_shift_limit, val_shift_limit). Default: 20.
  • p (float) – probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
class albumentations.augmentations.transforms.PadIfNeeded(min_height=1024, min_width=1024, border_mode=4, value=[0, 0, 0], always_apply=False, p=1.0)[source]

Pad side of the image / max if side is less than desired number.

Parameters:
  • p (float) – probability of applying the transform. Default: 1.0.
  • value (list of ints [r, g, b]) – padding value if border_mode is cv2.BORDER_CONSTANT.
Targets:
image, mask, bbox, keypoint
Image types:
uint8, float32
class albumentations.augmentations.transforms.RGBShift(r_shift_limit=20, g_shift_limit=20, b_shift_limit=20, always_apply=False, p=0.5)[source]

Randomly shift values for each channel of the input RGB image.

Parameters:
  • r_shift_limit ((int, int) or int) – range for changing values for the red channel. If r_shift_limit is a single int, the range will be (-r_shift_limit, r_shift_limit). Default: 20.
  • g_shift_limit ((int, int) or int) – range for changing values for the green channel. If g_shift_limit is a single int, the range will be (-g_shift_limit, g_shift_limit). Default: 20.
  • b_shift_limit ((int, int) or int) – range for changing values for the blue channel. If b_shift_limit is a single int, the range will be (-b_shift_limit, b_shift_limit). Default: 20.
  • p (float) – probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
class albumentations.augmentations.transforms.RandomBrightness(limit=0.2, always_apply=False, p=0.5)[source]
class albumentations.augmentations.transforms.RandomContrast(limit=0.2, always_apply=False, p=0.5)[source]
class albumentations.augmentations.transforms.MotionBlur(blur_limit=7, always_apply=False, p=0.5)[source]

Apply motion blur to the input image using a random-sized kernel.

Parameters:
  • blur_limit (int) – maximum kernel size for blurring the input image. Default: 7.
  • p (float) – probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
class albumentations.augmentations.transforms.MedianBlur(blur_limit=7, always_apply=False, p=0.5)[source]

Blur the input image using using a median filter with a random aperture linear size.

Parameters:
  • blur_limit (int) – maximum aperture linear size for blurring the input image. Default: 7.
  • p (float) – probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
class albumentations.augmentations.transforms.GaussNoise(var_limit=(10, 50), always_apply=False, p=0.5)[source]

Apply gaussian noise to the input image.

Parameters:
  • var_limit ((int, int) or int) – variance range for noise. If var_limit is a single int, the range will be (-var_limit, var_limit). Default: (10, 50).
  • p (float) – probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8
class albumentations.augmentations.transforms.CLAHE(clip_limit=4.0, tile_grid_size=(8, 8), always_apply=False, p=0.5)[source]

Apply Contrast Limited Adaptive Histogram Equalization to the input image.

Parameters:
  • clip_limit (float) – upper threshold value for contrast limiting. Default: 4.0. tile_grid_size ((int, int)): size of grid for histogram equalization. Default: (8, 8).
  • p (float) – probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8
class albumentations.augmentations.transforms.ChannelShuffle(always_apply=False, p=0.5)[source]

Randomly rearrange channels of the input RGB image.

Parameters:p (float) – probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
class albumentations.augmentations.transforms.InvertImg(always_apply=False, p=0.5)[source]

Invert the input image by subtracting pixel values from 255.

Parameters:p (float) – probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8
class albumentations.augmentations.transforms.ToGray(always_apply=False, p=0.5)[source]

Convert the input RGB image to grayscale. If the mean pixel value for the resulting image is greater than 127, invert the resulting grayscale image.

Parameters:p (float) – probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
class albumentations.augmentations.transforms.JpegCompression(quality_lower=99, quality_upper=100, always_apply=False, p=0.5)[source]

Decrease Jpeg compression of an image.

Parameters:
  • quality_lower (float) – lower bound on the jpeg quality. Should be in [0, 100] range
  • quality_upper (float) – lower bound on the jpeg quality. Should be in [0, 100] range
Targets:
image
Image types:
uint8, float32
class albumentations.augmentations.transforms.Cutout(num_holes=8, max_h_size=8, max_w_size=8, always_apply=False, p=0.5)[source]

CoarseDropout of the square regions in the image.

Parameters:
  • num_holes (int) – number of regions to zero out
  • max_h_size (int) – maximum height of the hole
  • max_w_size (int) – maximum width of the hole
Targets:
image
Image types:
uint8, float32

Reference: | https://arxiv.org/abs/1708.04552 | https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py | https://github.com/aleju/imgaug/blob/master/imgaug/augmenters/arithmetic.py

class albumentations.augmentations.transforms.ToFloat(max_value=None, always_apply=False, p=1.0)[source]

Divide pixel values by max_value to get a float32 output array where all values lie in the range [0, 1.0]. If max_value is None the transform will try to infer the maximum value by inspecting the data type of the input image.

See also

FromFloat

Parameters:
  • max_value (float) – maximum possible input value. Default: None.
  • p (float) – probability of applying the transform. Default: 1.0.
Targets:
image
Image types:
any type
class albumentations.augmentations.transforms.FromFloat(dtype='uint16', max_value=None, always_apply=False, p=1.0)[source]

Take an input array where all values should lie in the range [0, 1.0], multiply them by max_value and then cast the resulted value to a type specified by dtype. If max_value is None the transform will try to infer the maximum value for the data type from the dtype argument.

This is the inverse transform for ToFloat.

Parameters:
  • max_value (float) – maximum possible input value. Default: None.
  • dtype (string or numpy data type) – data type of the output. See the ‘Data types’ page from the NumPy docs. Default: ‘uint16’.
  • p (float) – probability of applying the transform. Default: 1.0.
Targets:
image
Image types:
float32
class albumentations.augmentations.transforms.Crop(x_min=0, y_min=0, x_max=1024, y_max=1024, always_apply=False, p=1.0)[source]

Crop region from image.

Parameters:
  • x_min (int) – minimum upper left x coordinate
  • y_min (int) – minimum upper left y coordinate
  • x_max (int) – maximum lower right x coordinate
  • y_max (int) – maximum lower right y coordinate
Targets:
image, mask, bboxes
Image types:
uint8, float32
class albumentations.augmentations.transforms.RandomScale(scale_limit=0.1, interpolation=1, always_apply=False, p=0.5)[source]

Randomly resize the input. Output image size is different from the input image size.

Parameters:
  • scale_limit ((float, float) or float) – scaling factor range. If scale_limit is a single float value, the range will be (1 - scale_limit, 1 + scale_limit). Default: 0.1.
  • interpolation (OpenCV flag) – flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
  • p (float) – probability of applying the transform. Default: 0.5.
Targets:
image, mask, bboxes
Image types:
uint8, float32
class albumentations.augmentations.transforms.LongestMaxSize(max_size=1024, interpolation=1, always_apply=False, p=1)[source]

Rescale an image so that maximum side is equal to max_size, keeping the aspect ratio of the initial image.

Parameters:
  • p (float) – probability of applying the transform. Default: 1.
  • max_size (int) – maximum size of the image after the transformation
Targets:
image, mask, bboxes
Image types:
uint8, float32
class albumentations.augmentations.transforms.SmallestMaxSize(max_size=1024, interpolation=1, always_apply=False, p=1)[source]

Rescale an image so that minimum side is equal to max_size, keeping the aspect ratio of the initial image.

Parameters:
  • p (float) – probability of applying the transform. Default: 1.
  • max_size (int) – maximum size of smallest side of the image after the transformation
Targets:
image, mask, bboxes
Image types:
uint8, float32
class albumentations.augmentations.transforms.Resize(height, width, interpolation=1, always_apply=False, p=1)[source]

Resize the input to the given height and width.

Parameters:
  • p (float) – probability of applying the transform. Default: 1.
  • height (int) – desired height of the output.
  • width (int) – desired width of the output.
  • interpolation (OpenCV flag) – flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
Targets:
image, mask, bboxes
Image types:
uint8, float32
class albumentations.augmentations.transforms.RandomSizedCrop(min_max_height, height, width, w2h_ratio=1.0, interpolation=1, always_apply=False, p=1.0)[source]

Crop a random part of the input and rescale it to some size.

Parameters:
  • min_max_height ((int, int)) – crop size limits.
  • height (int) – height after crop and resize.
  • width (int) – width after crop and resize.
  • w2h_ratio (float) – aspect ratio of crop.
  • interpolation (OpenCV flag) – flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
  • p (float) – probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes
Image types:
uint8, float32
class albumentations.augmentations.transforms.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.2, always_apply=False, p=0.5)[source]

Randomly change brightness and contrast of the input image.

Parameters:
  • brightness_limit ((float, float) or float) – factor range for changing brightness. If limit is a single float, the range will be (-limit, limit). Default: 0.2.
  • contrast_limit ((float, float) or float) – factor range for changing contrast. If limit is a single float, the range will be (-limit, limit). Default: 0.2.
  • p (float) – probability of applying the transform. Default: 0.5.
Targets:
image
Image types:
uint8, float32
class albumentations.augmentations.transforms.RandomCropNearBBox(max_part_shift=0.3, always_apply=False, p=1.0)[source]

Crop bbox from image with random shift by x,y coordinates

Parameters:
  • max_part_shift (float) – float value in (0.0, 1.0) range. Default 0.3
  • p (float) – probability of applying the transform. Default: 1.
Targets:
image
Image types:
uint8, float32
class albumentations.augmentations.transforms.RandomSizedBBoxSafeCrop(height, width, erosion_rate=0.0, interpolation=1, always_apply=False, p=1.0)[source]

Crop a random part of the input and rescale it to some size without loss of bboxes.

Parameters:
  • min_max_height ((int, int)) – crop size limits.
  • height (int) – height after crop and resize.
  • width (int) – width after crop and resize.
  • w2h_ratio (float) – aspect ratio of crop.
  • interpolation (OpenCV flag) – flag that is used to specify the interpolation algorithm. Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4. Default: cv2.INTER_LINEAR.
  • p (float) – probability of applying the transform. Default: 1.
Targets:
image, mask, bboxes
Image types:
uint8, float32

Functional transforms

albumentations.augmentations.functional.bbox_flip(bbox, d, rows, cols)[source]

Flip a bounding box either vertically, horizontally or both depending on the value of d.

Raises:ValueError – if value of d is not -1, 0 or 1.
albumentations.augmentations.functional.bbox_hflip(bbox, rows, cols)[source]

Flip a bounding box horizontally around the y-axis.

albumentations.augmentations.functional.bbox_rot90(bbox, factor, rows, cols)[source]

Rotates a bounding box by 90 degrees CCW (see np.rot90)

Parameters:
  • bbox (tuple) – A tuple (x_min, y_min, x_max, y_max).
  • factor (int) – Number of CCW rotations. Must be in range [0;3] See np.rot90.
  • rows (int) – Image rows.
  • cols (int) – Image cols.
albumentations.augmentations.functional.bbox_rotate(bbox, angle, rows, cols, interpolation)[source]

Rotates a bounding box by angle degrees

Parameters:
  • bbox (tuple) – A tuple (x_min, y_min, x_max, y_max).
  • angle (int) – Angle of rotation in degrees
  • rows (int) – Image rows.
  • cols (int) – Image cols.
  • interpolation (int) – interpolation method.
  • a tuple (return) –
albumentations.augmentations.functional.bbox_transpose(bbox, axis, rows, cols)[source]

Transposes a bounding box along given axis.

Parameters:
  • bbox (tuple) – A tuple (x_min, y_min, x_max, y_max).
  • axis (int) – 0 - main axis, 1 - secondary axis.
  • rows (int) – Image rows.
  • cols (int) – Image cols.
albumentations.augmentations.functional.bbox_vflip(bbox, rows, cols)[source]

Flip a bounding box vertically around the x-axis.

albumentations.augmentations.functional.crop_bbox_by_coords(bbox, crop_coords, crop_height, crop_width, rows, cols)[source]

Crop a bounding box using the provided coordinates of bottom-left and top-right corners in pixels and the required height and width of the crop.

albumentations.augmentations.functional.crop_keypoint_by_coords(keypoint, crop_coords, crop_height, crop_width, rows, cols)[source]

Crop a keypoint using the provided coordinates of bottom-left and top-right corners in pixels and the required height and width of the crop.

albumentations.augmentations.functional.elastic_transform(image, alpha, sigma, alpha_affine, interpolation=1, border_mode=4, random_state=None, approximate=False)[source]

Elastic deformation of images as described in [Simard2003] (with modifications). Based on https://gist.github.com/erniejunior/601cdf56d2b424757de5

[Simard2003]Simard, Steinkraus and Platt, “Best Practices for Convolutional Neural Networks applied to Visual Document Analysis”, in Proc. of the International Conference on Document Analysis and Recognition, 2003.
albumentations.augmentations.functional.elastic_transform_approx(image, alpha, sigma, alpha_affine, interpolation=1, border_mode=4, random_state=None)[source]

Elastic deformation of images as described in [Simard2003] (with modifications for speed). Based on https://gist.github.com/erniejunior/601cdf56d2b424757de5

[Simard2003]Simard, Steinkraus and Platt, “Best Practices for Convolutional Neural Networks applied to Visual Document Analysis”, in Proc. of the International Conference on Document Analysis and Recognition, 2003.
albumentations.augmentations.functional.grid_distortion(img, num_steps=10, xsteps=[], ysteps=[], interpolation=1, border_mode=4)[source]
Reference:
http://pythology.blogspot.sg/2014/03/interpolation-on-regular-distorted-grid.html
albumentations.augmentations.functional.keypoint_flip(bbox, d, rows, cols)[source]

Flip a keypoint either vertically, horizontally or both depending on the value of d.

Raises:ValueError – if value of d is not -1, 0 or 1.
albumentations.augmentations.functional.keypoint_hflip(kp, rows, cols)[source]

Flip a keypoint horizontally around the y-axis.

albumentations.augmentations.functional.keypoint_rot90(keypoint, factor, rows, cols, **params)[source]

Rotates a keypoint by 90 degrees CCW (see np.rot90)

Parameters:
  • keypoint (tuple) – A tuple (x, y, angle, scale).
  • factor (int) – Number of CCW rotations. Must be in range [0;3] See np.rot90.
  • rows (int) – Image rows.
  • cols (int) – Image cols.
albumentations.augmentations.functional.keypoint_vflip(kp, rows, cols)[source]

Flip a keypoint vertically around the x-axis.

albumentations.augmentations.functional.optical_distortion(img, k=0, dx=0, dy=0, interpolation=1, border_mode=4)[source]

Barrel / pincushion distortion. Unconventional augment.

Reference:
albumentations.augmentations.functional.preserve_channel_dim(func)[source]

Preserve dummy channel dim.

albumentations.augmentations.functional.preserve_shape(func)[source]

Preserve shape of the image.

Helper functions for working with bounding boxes

albumentations.augmentations.bbox_utils.normalize_bbox(bbox, rows, cols)[source]

Normalize coordinates of a bounding box. Divide x-coordinates by image width and y-coordinates by image height.

albumentations.augmentations.bbox_utils.denormalize_bbox(bbox, rows, cols)[source]

Denormalize coordinates of a bounding box. Multiply x-coordinates by image width and y-coordinates by image height. This is an inverse operation for normalize_bbox().

albumentations.augmentations.bbox_utils.normalize_bboxes(bboxes, rows, cols)[source]

Normalize a list of bounding boxes.

albumentations.augmentations.bbox_utils.denormalize_bboxes(bboxes, rows, cols)[source]

Denormalize a list of bounding boxes.

albumentations.augmentations.bbox_utils.calculate_bbox_area(bbox, rows, cols)[source]

Calculate the area of a bounding box in pixels.

albumentations.augmentations.bbox_utils.filter_bboxes_by_visibility(original_shape, bboxes, transformed_shape, transformed_bboxes, threshold=0.0, min_area=0.0)[source]

Filter bounding boxes and return only those boxes whose visibility after transformation is above the threshold and minimal area of bounding box in pixels is more then min_area.

Parameters:
  • original_shape (tuple) – original image shape
  • bboxes (list) – original bounding boxes
  • transformed_shape (tuple) – transformed image
  • transformed_bboxes (list) – transformed bounding boxes
  • threshold (float) – visibility threshold. Should be a value in the range [0.0, 1.0].
  • min_area (float) – Minimal area threshold.
albumentations.augmentations.bbox_utils.convert_bbox_to_albumentations(bbox, source_format, rows, cols, check_validity=False)[source]

Convert a bounding box from a format specified in source_format to the format used by albumentations: normalized coordinates of bottom-left and top-right corners of the bounding box in a form of [x_min, y_min, x_max, y_max] e.g. [0.15, 0.27, 0.67, 0.5].

Parameters:
  • bbox (list) – bounding box
  • source_format (str) – format of the bounding box. Should be ‘coco’ or ‘pascal_voc’.
  • check_validity (bool) – check if all boxes are valid boxes
  • rows (int) – image height
  • cols (int) – image width

Note

The coco format of a bounding box looks like [x_min, y_min, width, height], e.g. [97, 12, 150, 200]. The pascal_voc format of a bounding box looks like [x_min, y_min, x_max, y_max], e.g. [97, 12, 247, 212].

Raises:ValueError – if target_format is not equal to coco or pascal_voc.
albumentations.augmentations.bbox_utils.convert_bbox_from_albumentations(bbox, target_format, rows, cols, check_validity=False)[source]

Convert a bounding box from the format used by albumentations to a format, specified in target_format.

Parameters:
  • bbox (list) – bounding box with coordinates in the format used by albumentations
  • target_format (str) – required format of the output bounding box. Should be ‘coco’ or ‘pascal_voc’.
  • rows (int) – image height
  • cols (int) – image width
  • check_validity (bool) – check if all boxes are valid boxes

Note

The coco format of a bounding box looks like [x_min, y_min, width, height], e.g. [97, 12, 150, 200]. The pascal_voc format of a bounding box looks like [x_min, y_min, x_max, y_max], e.g. [97, 12, 247, 212].

Raises:ValueError – if target_format is not equal to coco or pascal_voc.
albumentations.augmentations.bbox_utils.convert_bboxes_to_albumentations(bboxes, source_format, rows, cols, check_validity=False)[source]

Convert a list bounding boxes from a format specified in source_format to the format used by albumentations

albumentations.augmentations.bbox_utils.convert_bboxes_from_albumentations(bboxes, target_format, rows, cols, check_validity=False)[source]

Convert a list of bounding boxes from the format used by albumentations to a format, specified in target_format.

Parameters:
  • bboxes (list) – List of bounding box with coordinates in the format used by albumentations
  • target_format (str) – required format of the output bounding box. Should be ‘coco’ or ‘pascal_voc’.
  • rows (int) – image height
  • cols (int) – image width
  • check_validity (bool) – check if all boxes are valid boxes