恺明大神 Mask R-CNN 超实用教程（car crash 3d）

原标题 | Mask R-CNN with Opencv

作者 | Adrian Rosebrock

翻译 | 天字一号（郑州大学）、李美丽（华南师范大学）、had_in（电子科技大学）、nengdaiper（北京科技大学）

编辑注 | 本文代码及链接较多，建议点击阅读原文收藏并访问相关链接。

在此教程中，你将学习如何在OpenCV中使用Mask R-CNN。

使用Mask R-CNN，你可以自动分割和构建图像中每个对象的像素级MASK。我们将应用Mask R-CNN到图像和视频流。

在上周的博客文章中，你学习了如何使用Yolo物体探测器来检测图像中物体（https://www.pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv/）。对象检测器，如yolo、faster r-cnn和ssd，生成四组（x，y）坐标，表示图像中对象的边界框。

从获取对象的边界框开始挺好的，但是边界框本身并不能告诉我们（1）哪些像素属于前景对象，（2）哪些像素属于背景。

这就引出了一个问题：

是否可以为图像中的每个对象生成一个MASK，从而允许我们从背景分割前景对象？
这样的方法可能吗？

答案是肯定的：我们只需要使用Mask R-CNN架构执行实例分割。

要了解如何利用opencv的Mask R-CNN应用于图像和视频流，继续看下去吧！

正在查找此博客的源代码？直接跳到下载（https://www.pyimagesearch.com/2018/11/19/mask-r-cnn-with-opencv/#）。

Mask R-CNN with OpenCV

在本教程的第一部分中，我们将讨论图像分类、对象检测、实例分割和语义分割之间的区别。

这里，我们将简要回顾Mask R-CNN架构及其与Faster R-CNN的关系。

然后，我将向您展示如何在图像和视频流上应用Mask R-CNN与OpenCV。

开始吧！

实例分割 vs. 语义分割

图1：图像分类(左上)，目标检测(右上)，语义分割(左下)，实例分割(右下)。在本教程中，我们将使用Mask R-CNN执行实例分割。（来源：https://arxiv.org/abs/1704.06857）

解释传统的图像分类、目标检测、语义分割和实例分割之间的区别，最好是用可视化方法。

在执行传统的图像分类时，我们的目标是预测一组标签来表示输入图像的内容(左上角)。

目标检测建立在图像分类的基础上，但这一次需要我们对图像中每个对象定位。图像的表征如下：

每个目标边界框的坐标(x, y)
每个边界框关联的类别标签

左下角是一个语义分割的例子。语义分割算法要求我们将输入图像中的每个像素与一个类别标签(包括一个用于背景的类标签)关联起来。

注意关注我们语义分割的可视化——注意每个目标是如何分割的，但每个“cube”目标都有相同的颜色。

虽然语义分割算法能够对图像中的所有目标进行标记，但它们无法区分同一类的两个对象。

特别是同一个类别的两个目标是相互遮挡时，问题更加明显，我们不知道一个对象的边界在哪里结束以及哪里开始，如图上两个紫色立方体所示,我们无法说清楚一个立方体边界的开始和结束。

另一方面，实例分割算法为图像中的每个对象计算像素级mask，即使对象具有相同的类别标签(右下角)。在这里，您可以看到每个立方体都有自己独特的颜色，这意味着我们的实例分割算法不仅定位了每个独立的立方体，而且还预测了它们的边界。

而在本教程，我们将要讨论的Mask R-CNN架构就是一个实例分割算法的示例。

什么是 Mask R-CNN？

Mask R-CNN算法是何凯明等人在2017年发表的论文中提出的，Mask R-CNN（https://arxiv.org/abs/1703.06870）。

Mask R-CNN是基于之前的目标检测工作R-CNN(2013)、Fast R-CNN(2015)、Faster R-CNN(2015)，均由Girshick等人完成。

为了理解Mask R-CNN，让我们简要回顾一下R-CNN的变体，从原始的R-CNN开始:

图2：初始的R-CNN架构(来源：Girshick等人，2013)

最初的R-CNN算法分为四个步骤：

步骤1：向网络输入图像。

步骤2：提取区域proposals(即，可能包含对象的图像区域)算法，如选择性搜索算法（http://www.huppelen.nl/publications/selectiveSearchDraft.pdf）。

步骤3：利用迁移学习进行特征提取，使用预先训练的CNN计算每个proposals的特征(这实际上是一个ROI)。

步骤4：使用支持向量机(SVM)对提取的特征进行分类。

这种方法之所以有效，是因为CNN学习的特征的鲁棒性和可鉴别性。

然而，R-CNN方法的问题在于它非常慢。此外，我们实际上并没有学习如何通过深度神经网络进行定位，我们只是在有效地构建一个更高级的HOG 线性SVM检测器（https://www.pyimagesearch.com/2014/11/10/histogram-oriented-gradients-object-detection/）。

为了改进原有的R-CNN, Girshick等人发表了Fast R-CNN算法：

图3：Fast R-CNN架构(来源：Girshick et al.， 2015)。

与原始的R-CNN相似，Fast R-CNN仍然使用选择性搜索来获取区域建议；然而，本文的新贡献是感兴趣区域(ROI)池化模块。

ROI池化的工作原理是从特征map中提取一个固定大小的窗口，并使用这些特征获得最终的类别标签和边界框。这样做主要好处是，网络现在可以有效地端到端地进行训练：

我们输入一个图像和对应的实际的边界框
提取图像的特征map
应用ROI池化，得到ROI特征向量
最后, 使用两组全连接层来获得(1)类别标签预测(2)每个proposal的边框位置。

虽然网络现在是可以端到端训练的，但是由于依赖于选择性搜索算法，在推断时性能仍受到了极大的影响。

为了使R-CNN的架构更快，我们需要直接利用R-CNN获得区域proposal：

图4：Faster R-CNN架构(来源：Girshick et al.， 2015)

Girshick等人的Faster R-CNN论文将 区域proposals网络(RPN)引入到神经网络架构中，减少了对选择性搜索算法的需求。

总的来说，Faster R-CNN架构能够以大约7-10帧每秒的速度运行，这是通过深度学习实现实时目标检测的一大进步。

Mask R-CNN算法建立在Faster R-CNN架构的基础之上，主要有两个贡献：

用更精确的ROI align模块替换ROI Pooling模块
从ROI align模块中插入一个额外的分支

这个额外的分支的输入为ROI align模块的输出，然后将其输入到两个CONV层。

CONV层的输出即是掩摸(mask)本身。

我们可以将Mask R-CNN架构可视化如下图所示：

图5：He等人的Mask R-CNN工作用一个更精确的ROI align模块替换ROI Polling模块。然后将ROI模块的输出送入两个CONV层。CONV层的输出即是掩摸(mask)本身。

注意两个CONV层的分支来自ROI Align模块——我们实际生成掩摸由该模块生成。

我们知道，Faster R-CNN/Mask R-CNN架构利用区域proposal网络(RPN)生成可能包含对象的图像区域。

这些区域都是根据它们的“可能是目标的评分”(即，给定区域可能包含目标的可能性)，然后保留最可能的前N个目标区域。

在原来Faster R-CNN论文中，Girshick等人设置N= 2000，但在实践中，我们可以用一个小得多的N，比如N={10,100, 200,300}，仍然可以得到很好的结果。

He等人在他们的论文（https://arxiv.org/abs/1703.06870）中设置N=300，这也是我们这里使用的值。

所选的300个ROIs中的每一个都要经过网络的三个并行分支:

类别标签预测
边界框预测
掩摸预测

上面的图5显示了这些分支。

在预测时,300个ROIs都会经过非极大值抑制算法（https://www.pyimagesearch.com/2014/11/17/non-maximum-suppression-object-detection-python/），然后仅保存可能性前100的检测框,使得最终得到一个四维的100 x L x 15 x 15张量，L为数据几种类别标签的数量，15 x 15是每个类别L的掩摸(mask)的大小。

我们今天使用的掩模R-CNN是在COCO数据集上训练的（http://cocodataset.org/#home），它有L=90个类，因此掩模R CNN掩模模块的最终体积大小是100 x 90 x 15 x 15。

Mask R-CNN的可视化过程，请看下图:

图6：Mask R-CNN过程的可视化，先生成一个15 x 15的掩摸，遮罩改变到图像的原始尺寸，最后将掩摸覆盖到原始图像上。(来源：Python深度学习计算机视觉，ImageNet Bundle)

这里你可以看到，我们从我们的输入图像开始，并通过我们的Mask R-CNN网络，最终获得我们的掩摸预测。

预测的掩模只有15 x 15的像素，因此我们将掩模的大小调整回原始输入图像的尺寸。

最后，调整大小后的掩模可以覆盖在原始输入图像上。要了解更多关于Mask R-CNN工作原理的详细讨论，请参考:

由何等人发表的Mask R-CNN论文（https://arxiv.org/abs/1703.06870）
我的书, Deep Learning for Computer Vision with Python（https://www.pyimagesearch.com/deep-learning-computer-vision-python-book/），在这本书里，我将更详细地讨论Mask R-CNNs，包括如何根据自己的数据从零开始训练自己的Mask R-CNNs。

项目结构

我们今天的项目主要由两个脚本组成，还有其他几个重要的文件。

我已经按照如下方式构建了这个项目(直接在终端上的tree命令输出):

基于OpenCV的Mask R-CNN----Shell

$ tree.├── mask-rcnn-coco│ ├── colors.txt│ ├── frozen_inference_graph.pb│ ├── mask_rcnn_inception_v2_coco_2018_01_28.pbtxt│ └── object_detection_classes_coco.txt├── images│ ├── example_01.jpg│ ├── example_02.jpg│ └── example_03.jpg├── videos│ ├──├── output│ ├──├── mask_rcnn.py└── mask_rcnn_video.py4 directories, 9 files

我们的项目包括四个目录:

mask-rcnn-coco/ : Mask R-CNN的模型文件。有四个文件:

frozen_inference_graph.pb : Mask R-CNN模型的权重文件，是基于COCO数据集上预训练的权重。
mask_rcnn_inception_v2_coco_2018_01_28.pbtxt : Mask R-CNN模型的配置文件。如果你想在你自己的标注的数据上建立并训练你自己的模型，参考 Deep Learning for Computer Vision with Python（https://www.pyimagesearch.com/deep-learning-computer-vision-python-book/）.
object_detection_classes_coco.txt : 所有90个类别都列在这个文本文件中，每行一个。在文本编辑器中打开它，查看模型可以识别哪些对象。
colors.txt : 这个文本文件包含六种颜色，可以随机分配给图像中检测到的目标。

images/ : 我在“Downloads”中提供了三个测试图像。请随意添加您自己的图像进行测试
videos/ : 这是一个空目录。实际上，我用从YouTube上搜集的大型视频进行了测试(Credits下面，就在“Summary”部分的上面)。我更倾向于建议您可以在YouTube上找到一些视频下载并测试，而不是提供一个真正大的zip文件。或者用你的手机拍一些视频，然后回到你的电脑前使用它们!
output/ :另一个空目录，将保存处理过的视频(假设您将命令行参数设置为输出到此目录)。

今天我们将回顾两个脚本:

mask_rcnn.py : 这个脚本将执行实例分割并对图像应用一个掩码，这样您就可以看到Mask R-CNN检测出的对象在哪里，精细到像素。
mask_rcnn_video.py : 这个视频处理脚本使用相同的Mask R-CNN，并将模型应用于视频文件的每一帧。然后脚本将输出帧写回磁盘上的视频文件中。

基于OpenCV的关于图像的Mask R-CNN

现在，我们已经回顾了Mask R-CNNs的工作原理，让我们动手写一些Python代码。

在开始之前，请确保您的Python环境已经安装了OpenCV 3.4.2/3.4.3或更高版本。您可以按照我的OpenCV安装教程（https://www.pyimagesearch.com/opencv-tutorials-resources-guides/）来升级/安装OpenCV。如果您想在5分钟或更短的时间内启动和运行，可以考虑使用pip安装OpenCV（https://www.pyimagesearch.com/2018/09/19/pip-install-opencv/）。如果您有其他一些需求，您可能希望从源代码编译OpenCV。

请确保您已经从本博客文章的“Downloads”部分下载了源代码、训练过的Mask R-CNN以及示例图像。

然后，打开mask_rcnn.py文件并插入以下代码:

Mask R-CNN with OpenCV---Python

# import the necessary packagesimport numpy as npimport argparseimport randomimport timeimport cv2import os

首先，我们将在第2-7行导入所需的包。值得注意的是，我们正在导入NumPy和OpenCV包。大多数Python安装都默认安装了上所需的其他的包。

现在我们开始解析我们的命令行参数（https://www.pyimagesearch.com/2018/03/12/python-argparse-command-line-arguments/）：

Mask R-CNN with OpenCV---Python# construct the argument parse and parse the argumentsap = argparse.ArgumentParserap.add_argument("-i", "--image", required=True,help="path to input image")ap.add_argument("-m", "--mask-rcnn", required=True,help="base path to mask-rcnn directory")ap.add_argument("-v", "--visualize", type=int, default=0,help="whether or not we are going to visualize each instance")ap.add_argument("-c", "--confidence", type=float, default=0.5,help="minimum probability to filter weak detections")ap.add_argument("-t", "--threshold", type=float, default=0.3,help="minimum threshold for pixel-wise mask segmentation")args = vars(ap.parse_args)

我们的脚本在终端中运行需要传递命令行参数标志以及参数。我们的参数在第10-21行进行解析，其中前两行是必需的，其余的是可选的：

--image : 输入图像的路径。
--mask-rnn : Mask R-CNN文件的根路径 .
--visualize (可选): 正值表示想要可视化如何在屏幕上提取屏蔽区域。无论哪种方式，我们都将在屏幕上显示最终的输出。
--confidence (optional): 您可以选择0-0.5的概率值，该值用于过滤概率较低的检测区域。
--threshold (可选): 我们将为图像中的每个对象创建一个二进制掩码，这个阈值将帮助我们过滤掉概率较低的掩码。我发现默认值0.3时效果较好。

现在我们的命令行参数存储在args字典中，让我们加载标签和颜色：

Mask R-CNN with OpenCV---Python

# load the COCO class labels our Mask R-CNN was trained onlabelsPath = os.path.sep.join([args["mask_rcnn"],"object_detection_classes_coco.txt"])LABELS = open(labelsPath).read.strip.split("\n")
# load the set of colors that will be used when visualizing a given# instance segmentationcolorsPath = os.path.sep.join([args["mask_rcnn"], "colors.txt"])COLORS = open(colorsPath).read.strip.split("\n")COLORS = [np.array(c.split(",")).astype("int") for c in COLORS]COLORS = np.array(COLORS, dtype="uint8")

第24-26行加载COCO对象类别标签。现在的Mask R-CNN能够识别90个类，包括人，车辆，标志，动物，日常用品，体育用品，厨房用品，食物等等！我建议您查看object_detection_classes_cocoa .txt，以查看可用的类别。

这里我们从路径加载颜色文件，并执行一些数组转换操作(第30-33行)。

现在加载我们的模型：

Mask R-CNN with OpenCV---Python

# derive the paths to the Mask R-CNN weights and model configurationweightsPath = os.path.sep.join([args["mask_rcnn"],"frozen_inference_graph.pb"])configPath = os.path.sep.join([args["mask_rcnn"],"mask_rcnn_inception_v2_coco_2018_01_28.pbtxt"])# load our Mask R-CNN trained on the COCO dataset (90 classes)# from diskprint("[INFO] loading Mask R-CNN from disk...")net = cv2.dnn.readNetFromTensorflow(weightsPath, configPath)

首先，我们构建权重和配置路径(第36-39行)，然后通过这些路径加载模型(第44行)。

在下一个代码块中，我们将加载Mask R-CNN神经网络，输入一张图像：

Mask R-CNN with OpenCV---Python

# load our input image and grab its spatial dimensionsimage = cv2.imread(args["image"])(H, W) = image.shape[:2]
# construct a blob from the input image and then perform a forward# pass of the Mask R-CNN, giving us (1) the bounding box coordinates# of the objects in the image along with (2) the pixel-wise segmentation# for each specific objectblob = cv2.dnn.blobFromImage(image, swapRB=True, crop=False)net.setInput(blob)start = time.time(boxes, masks) = net.forward(["detection_out_final", "detection_masks"])end = time.time
# show timing information and volume information on Mask R-CNNprint("[INFO] Mask R-CNN took {:.6f} seconds".format(end - start))print("[INFO] boxes shape: {}".format(boxes.shape))print("[INFO] masks shape: {}".

这里我们进行了以下操作：

Load the input image and extract dimensions for scaling purposes later (Lines 47 and 48).
Construct a blob via cv2.dnn.blobFromImage (Line 54). You can learn why and how to use this function in my previous tutorial（https://www.pyimagesearch.com/2017/11/06/deep-learning-opencvs-blobfromimage-works/）.
Perform a forward pass of the blob through the net while collecting timestamps (Lines 55-58). The results are contained in two important variables: boxes and masks .

现在我们已经在图像上执行了口罩R-CNN的正向传递，我们想要过滤可视化我们的结果。这正是下一个for循环要完成的。它很长，所以我把它分成五个代码块，从这里开始：

# loop over the number of detected objectsfor i in range(0, boxes.shape[2]):# extract the class ID of the detection along with the confidence# (i.e., probability) associated with the predictionclassID = int(boxes[0, 0, i, 1])confidence = boxes[0, 0, i, 2]
# filter out weak predictions by ensuring the detected probability# is greater than the minimum probabilityif confidence > args["confidence"]:# clone our original image so we can draw on itclone = image.copy
# scale the bounding box coordinates back relative to the# size of the image and then compute the width and the height# of the bounding boxbox = boxes[0, 0, i, 3:7] * np.array([W, H, W, H])(startX, startY, endX, endY) = box.astype("int")boxW = endX - startXboxH = endY - startY

在这个代码块中，我们开启了一个训练，不断根据置信度过滤/并进行可视化(第66行)。

我们继续提取特定检测对象的分类和置信度(第69行和第70行)。

在此基础之上，我们通过将置信度与命令行参数置信度值进行比较，从而过滤掉置信度较低的预测结果，确保超过该值(第74行)。

然后我们缩放对象的边界框，并计算框的大小(第81-84行)。

图像分割要求我们找到目标所在的所有像素。因此，我们将在对象的顶部放置一个透明的层，以查看我们的算法执行的效果。为了做到这一点，我们将计算一个掩模：

Mask R-CNN with OpenCV---Python

# extract the pixel-wise segmentation for the object, resize# the mask such that it's the same dimensions of the bounding# box, and then finally threshold to create a *binary* maskmask = masks[i, classID]mask = cv2.resize(mask, (boxW, boxH),interpolation=cv2.INTER_NEAREST)mask = (mask > args["threshold"])
# extract the ROI of the imageroi = clone[startY:endY, startX:endX]

在第89-91行，我们提取了对象的像素级分割，并将其调整为原始图像的尺寸。最后，我们设置掩码的阈值，使其成为二进制数组/图像(第92行)。

我们还提取了对象所在的感兴趣区域(第95行)。

在本文后面的图8中可以看到遮罩和roi的可视化结果。

为了方便起见，下一个代码块实现了掩码、roi和分割实例的可视化，如果通过命令行设置了参数 --visualize的话。

Mask R-CNN with OpenCV---Python

# check to see if are going to visualize how to extract the# masked region itselfif args["visualize"] > 0:# convert the mask from a boolean to an integer mask with# to values: 0 or 255, then apply the maskvisMask = (mask * 255).astype("uint8")instance = cv2.bitwise_and(roi, roi, mask=visMask)
# show the extracted ROI, the mask, along with the# segmented instancecv2.imshow("ROI", roi)cv2.imshow("Mask", visMask)cv2.imshow("Segmented", instance)

这个代码块中我们进行了以下操作：

检查是否应该可视化ROI、掩模和分割实例(第99行)。
将掩模从布尔值转换为整数，其中值“0”表示背景，“255”表示前景(第102行)。
执行按位掩模以仅仅可视化分割实例本身(第103行)。
显示三个结果图像(第107-109行)。

同样，只有通过可选的命令行设置参数 --visualize 标志时，才会显示这些可视化图像(默认情况下不会显示这些图像)。

现在让我们继续可视化：

Mask R-CNN with OpenCV---Python

# now, extract *only* the masked region of the ROI by passing# in the boolean mask array as our slice conditionroi = roi[mask]
# randomly select a color that will be used to visualize this# particular instance segmentation then create a transparent# overlay by blending the randomly selected color with the ROIcolor = random.choice(COLORS)blended = ((0.4 * color) (0.6 * roi)).astype("uint8")
# store the blended ROI in the original imageclone[startY:endY, startX:endX][mask] = blended

第113行通过将布尔掩模数组作为切片条件传递，只提取ROI的掩模区域。

然后我们将随机选择六种颜色中的一种，将透明的覆盖层应用到对象上(第118行)。

随后，我们将用roi混合掩模区域(第119行)，然后将该混合区域放入图像clone中(第122行)。

最后，我们将在图像上绘制矩形和文本类别标签置信度的值，并显示结果!

Mask R-CNN with OpenCV---Python

# draw the bounding box of the instance on the imagecolor = [int(c) for c in color]cv2.rectangle(clone, (startX, startY), (endX, endY), color, 2)
# draw the predicted label and associated probability of the# instance segmentation on the imagetext = "{}: {:.4f}".format(LABELS[classID], confidence)cv2.putText(clone, text, (startX, startY - 5),cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
# show the output imagecv2.imshow("Output", clone)cv2.waitKey(0)

最后，我们进行以下操作：

在对象周围绘制一个彩色边框(第125行和第126行)。

构建我们的类别标签置信度文本，并在边界框上面绘制文本(第130-132行)。

显示图像，直到按下任意键(第135行和第136行)。

现在让我们来试试我们的Mask R-CNN代码!

确保您已经从本教程的“Downloads”部分下载了源代码、训练过的Mask R-CNN和示例图像。然后，打开您的终端并执行以下操作：

Mask R-CNN with OpenCV---Shell

$ python mask_rcnn.py --mask-rcnn mask-rcnn-coco --image images/example_01.jpg[INFO] loading Mask R-CNN from disk...[INFO] Mask R-CNN took 0.761193 seconds[INFO] boxes shape: (1, 1, 3, 7)[INFO] masks shape: (100, 90, 15, 15)

图7：一个用于汽车场景的口罩R-CNN。Python和OpenCV用于生成掩码。

在上面的图片中，你可以看到我们的Mask R-CNN不仅定位了图片中的每一辆车，还构建了一个像素级掩模，允许我们从图片中分割每一辆车。

如果我们运行相同的命令，这次提供--visualize参数标志，我们还可以可视化ROI、掩模和实例：

图8：使用--visuatize标志参数，我们可以查看用Python和OpenCV构建的mask R-CNN流程的ROI、掩模、分割的中间步骤。

让我们再看另一个例子：

Mask R-CNN with OpenCV---Shell

$ python mask_rcnn.py --mask-rcnn mask-rcnn-coco --image images/example_02.jpg \--confidence 0.6[INFO] loading Mask R-CNN from disk...[INFO] Mask R-CNN took 0.676008 seconds[INFO] boxes shape: (1, 1, 8, 7)[INFO] masks shape: (100, 90, 15, 15)

图9：使用Python和OpenCV，我们可以使用Mask R-CNN执行实例分割。

我们的Mask R-CNN从图像中正确地检测并分割了人、狗、马和卡车。

在我们开始在视频中使用Mask R-CNNs之前，还有最后一个例子:

Mask R-CNN with OpenCV---Shell

$ python mask_rcnn.py --mask-rcnn mask-rcnn-coco --image images/example_03.jpg[INFO] loading Mask R-CNN from disk...[INFO] Mask R-CNN took 0.680739 seconds[INFO] boxes shape: (1, 1, 3, 7)[INFO] masks shape: (100, 90, 15, 15)

图10：在这里，您可以看到我正在喂比格尔家的小猎犬杰玛。每个被标识对象的像素级映射都被屏蔽，并透明地覆盖在对象上。这幅图像是使用OpenCV和Python使用一个预训练的Mask R-CNN模型生成的。

在这张图片中，你可以看到我和杰玛的照片，这是我家的小猎犬。

我们的面具R-CNN能够以比较高的置信度检测和定位我，杰玛和椅子。

OpenCV和Mask RCNN在视频流中的应用

我们已经学会了怎么将Mask RCNN应用于图像上，现在我们进一步学习如何在视频上应用Mask RCNN.

打开文件 mask_rcnn_video.py，插入下列代码：

# import the necessary packagesimport numpy as npimport argparseimport imutilsimport timeimport cv2import os# construct the argument parse and parse the argumentsap = argparse.ArgumentParserap.add_argument("-i", "--input", required=True,help="path to input video file")ap.add_argument("-o", "--output", required=True,help="path to output video file")ap.add_argument("-m", "--mask-rcnn", required=True,help="base path to mask-rcnn directory")ap.add_argument("-c", "--confidence", type=float, default=0.5,help="minimum probability to filter weak detections")ap.add_argument("-t", "--threshold", type=float, default=0.3,help="minimum threshold for pixel-wise mask segmentation")args = vars(ap.parse_args)

首先，我们插导入必要的包，设置声明的参数，紧接着的两行代码是放置需要检测的图像。

加载类别，颜色和Mask RCNN模型：

# load the COCO class labels our Mask R-CNN was trained onlabelsPath = os.path.sep.join([args["mask_rcnn"],"object_detection_classes_coco.txt"])LABELS = open(labelsPath).read.strip.split("\n")# initialize a list of colors to represent each possible class labelnp.random.seed(42)COLORS = np.random.randint(0, 255, size=(len(LABELS), 3),dtype="uint8")# derive the paths to the Mask R-CNN weights and model configurationweightsPath = os.path.sep.join([args["mask_rcnn"],"frozen_inference_graph.pb"])configPath = os.path.sep.join([args["mask_rcnn"],"mask_rcnn_inception_v2_coco_2018_01_28.pbtxt"])# load our Mask R-CNN trained on the COCO dataset (90 classes)# from diskprint("[INFO] loading Mask R-CNN from disk...")net = cv2.dnn.readNetFromTensorflow(weightsPath, configPath)

类别和颜色加载代码在第24-31行。

在加载MaskRCNN网络之前，需要先加载权重模型和配置文件（第34-42行）。

然后初始化视频流和视频写入器。

# initialize the video stream and pointer to output video filevs = cv2.VideoCapture(args["input"])writer = None# try to determine the total number of frames in the video filetry:prop = cv2.cv.CV_CAP_PROP_FRAME_COUNT if imutils.is_cv2 \else cv2.CAP_PROP_FRAME_COUNTtotal = int(vs.get(prop))print("[INFO] {} total frames in video".format(total))# an error occurred while trying to determine the total# number of frames in the video fileexcept:print("[INFO] could not determine # of frames in video")total = -1

我们的视频流和视频

我们尝试确定视频文件的帧数，并将总的帧数显示出来。如果不成功的话，程序就会终止，但打印状态信息，或者我们将这个帧数设置成-1，忽略这一个步骤，不做任何处理。

让我们开始对所有帧进行循环。

# loop over frames from the video file streamwhile True:# read the next frame from the file(grabbed, frame) = vs.read# if the frame was not grabbed, then we have reached the end# of the streamif not grabbed:break# construct a blob from the input frame and then perform a# forward pass of the Mask R-CNN, giving us (1) the bounding box# coordinates of the objects in the image along with (2) the# pixel-wise segmentation for each specific objectblob = cv2.dnn.blobFromImage(frame, swapRB=True, crop=False)net.setInput(blob)start = time.time(boxes, masks) = net.forward(["detection_out_final","detection_masks"])end = time.time

我们开始通过定义无限循环来循环所有的帧，并且捕获第一帧（第 62-64行）。循环地处理视频，直到满足退出条件（第68和69行）。

然后，我们从帧中构造一个 blob，并在计算通过神经网络的时间，以便我们可以计算完成所需时间（第 75-80 行）。检测的结果同时包含了检测框和蒙版。

现在，让我们开始循环检测物体：

# loop over the number of detected objectsfor i in range(0, boxes.shape[2]):# extract the class ID of the detection along with the# confidence (i.e., probability) associated with the# predictionclassID = int(boxes[0, 0, i, 1])confidence = boxes[0, 0, i, 2]# filter out weak predictions by ensuring the detected# probability is greater than the minimum probabilityif confidence > args["confidence"]:# scale the bounding box coordinates back relative to the# size of the frame and then compute the width and the# height of the bounding box(H, W) = frame.shape[:2]box = boxes[0, 0, i, 3:7] * np.array([W, H, W, H])(startX, startY, endX, endY) = box.astype("int")boxW = endX - startXboxH = endY - startY# extract the pixel-wise segmentation for the object,# resize the mask such that it's the same dimensions of# the bounding box, and then finally threshold to create# a *binary* maskmask = masks[i, classID]mask = cv2.resize(mask, (boxW, boxH),interpolation=cv2.INTER_NEAREST)mask = (mask > args["threshold"])# extract the ROI of the image but *only* extracted the# masked region of the ROIroi = frame[startY:endY, sta

首先我们过滤掉低置信度的检测结果，然后确定检测框的坐标和蒙版。

现在，让我们绘制出边界检测框和类别的置信度。

# grab the color used to visualize this particular class,# then create a transparent overlay by blending the color# with the ROIcolor = COLORS[classID]blended = ((0.4 * color) (0.6 * roi)).astype("uint8")# store the blended ROI in the original frameframe[startY:endY, startX:endX][mask] = blended# draw the bounding box of the instance on the framecolor = [int(c) for c in color]cv2.rectangle(frame, (startX, startY), (endX, endY),color, 2)# draw the predicted label and associated probability of# the instance segmentation on the frametext = "{}: {:.4f}".format(LABELS[classID], confidence)cv2.putText(frame, text, (startX, startY - 5),cv2.FONT_HERSHEY_SIMPLEX, 0.5, col

这里，我们将兴趣区域用颜色表示出来，储存在原始的框架中。

然后我们绘制矩形框，并显示类别的颜色和置信度。

最后写入视频文件，清除缓存。

# check if the video writer is Noneif writer is None:# initialize our video writerfourcc = cv2.VideoWriter_fourcc(*"MJPG")writer = cv2.VideoWriter(args["output"], fourcc, 30,(frame.shape[1], frame.shape[0]), True)# some information on processing single frameif total > 0:elap = (end - start)print("[INFO] single frame took {:.4f} seconds".format(elap))print("[INFO] estimated total time to finish: {:.4f}".format(elap * total))# write the output frame to diskwriter.write(frame)# release the file pointersprint("[INFO] cleaning up...")writer.releasevs.release

我们视频中的第一个循环迭代

估计所用的处理时间将打印到终端上

我们循环的最后操作是通过编写器将帧写入磁盘。

你会注意到，我没有在屏幕上显示每个帧。显示操作非常耗时，当脚本完成处理时，你仍可以使用任何媒体播放器查看输出视频。

注意：此外，OpenCV 不支持 NVIDIA GPU 的 dnn 模块。目前仅支持数量有限的 GPU，主要是英特尔 GPU。NVIDIA GPU 支持即将推出，但目前我们无法轻松地使用具有 OpenCV dnn 的 GPU.

最后，我们发布视频输入和输出文件指针。

现在，我们已经编码了我们的Mask R-CNN和OpenCV脚本的视频流，你可以自己尝试下！

确保你使用"下载"。

然后，你需要使用智能手机或其他录制设备收集你自己的视频。或者，你也可以像我一样从 YouTube 下载视频。

现在可以打开终端，执行下面代码：

$ python mask_rcnn_video.py --input videos/cats_and_dogs.mp4 \--output output/cats_and_dogs_output.avi --mask-rcnn mask-rcnn-coco[INFO] loading Mask R-CNN from disk...[INFO] 19312 total frames in video[INFO] single frame took 0.8585 seconds[INFO] estimated total time to finish: 16579.2047

图11：在上面的视频中，你可以找到包含狗和猫的有趣视频剪辑，并Mask R-CNN应用在上面！（观看这个视频：https://youtu.be/T_GXkW0BUyA）

下面是第二个例子，这里应用OpenCV和Mask R-CNN检测寒冷天气下滑动的汽车。

$ python mask_rcnn_video.py --input videos/slip_and_slide.mp4 \--output output/slip_and_slide_output.avi --mask-rcnn mask-rcnn-coco[INFO] loading Mask R-CNN from disk...[INFO] 17421 total frames in video[INFO] single frame took 0.9341 seconds[INFO] estimated total time to finish: 16272.9920

图12 利用Python和Opencv将Mask RCNN应用于视频中的车辆检测

你可以想象一下，将Mask RCNN应用于拥挤道路上，检查道路拥挤、车祸和需要帮助的车辆。（观看视频：https://www.youtube.com/watch?v=8nbzVARfosE）

文中视频和音频的来源：

猫狗：

“Try Not To Laugh Challenge – Funny Cat & Dog Vines compilation 2017”on YouTube（https://www.youtube.com/watch?v=EtH9Yllzjcc）
“Happy rock” on BenSound（https://www.bensound.com/royalty-free-music/track/happy-rock）

Slip and Slide：

“Compilation of Ridiculous Car Crash and Slip & Slide Winter Weather – Part 1” on YouTube（https://www.youtube.com/watch?v=i59v0p-gAtk）
“Epic” on BenSound（https://www.bensound.com/royalty-free-music/track/epic）

我该怎样训练自己的ask R-CNN模型？

图13：在我的书 Deep Learning for Computer Vision with Python中

Mask RCNN模型的预训练权重模型是在COCO数据集上训练得到的。

但是，如果你想在自定义数据集上训练 Mask R-CNN呢？

在我的书Deep Learning for Computer Vision with Python中有详细介绍。

我教你如何训练一个Mask R-CNN自动检测和分割癌性皮肤病变影像-第一步，建立一个自动癌症危险因素分类系统。
为您提供我最喜欢的图像标注工具，使您能够为输入图像创建蒙版。
向您展示如何在自定义数据集上训练 Mask R-CNN。
在训练自己的 Mask R-CNN 时，为您提供我的最佳实践、提示和建议。

所有 Mask R-CNN 章节都包含算法和代码的详细说明，确保您能够成功训练自己的 Mask R-CNN。要了解有关我的书的更多信息（并获取免费的示例章节和目录集），请查看：https://www.pyimagesearch.com/deep-learning-computer-vision-python-book/

总结

在这个教程中，你学到了在OpenCV和Python下用Mask R-CNN进行图像和视频流中的目标分割。

像YOLO，SSD和Faster R-CNN这样的目标检测方法仅能够生成图像中目标的边界框 — 我们并不能从它们的方法得知目标的实际形状。

而用 Mask R-CNN 我们能得到有相对形状的颜色块，从而帮助我们把物体从背景中分离。

进一步说，Mask R-CNN可以帮助我们从传统计算机视觉算法无法实现的图像中分割出复杂的物体和形状。

希望今天的教程能帮到你更好地了解OpenCV 和 Mask R-CNN！

via https://www.pyimagesearch.com/2018/11/19/mask-r-cnn-with-opencv/