想知道大型活动人数？用深度学习替代人工计数吧（人群计数冲突）

你能统计/估计这张照片中参与这次活动的人数吗？

下面这张呢？

本文将创建一个计算人群的算法，与人工统计相比，它具有惊人的准确性。

什么是人群计数？

人群计数是一种计算或估计图像中人数的技术。我们回到上一张图片，请先花点时间分析一下：你能告诉我此图中大概有多少人吗？是的，包括背景中那些人。最直接的方法是一个个数，但这是否具有实际意义？当人群如此庞大时，这几乎是不可能的！

人群科学家（是的，这是一个真正的职位！）先计算图像中某些部分的人数，然后推断出总人数的估计值。更常见的是，几十年来，我们不得不依靠粗略的指标来估算这个数字。

当然，有更好、更精确的方法吗？

是的，有！

虽然我们还没有能够计算出精确数字的算法，但大多数计算机视觉技术都可以得出令人震惊的精确估算值。在深入研究其背后的算法之前，首先理解为什么人群计数很重要。

为什么人群计数有意义？

让我们通过一个例子来理解人群计数的实用性。试想一下，如果你所在的公司刚刚举办完一次大型数据科学会议。大会期间举行了许多不同的小型会议。

你需要分析和估计每次会议的出席人数，这将帮助你的团队了解什么样的会议吸引了最多的人（以及哪些会议的参加人数最少）。分析结果将影响明年会议的安排，所以这是一项重要的任务！

大会中有数百人，手动计算人数需要数天时间，这时就需要发挥数据科学家的技能了。我们可以从每一个环节获得参会人群的照片，并建立一个计算机视觉模型来完成剩下的工作。

还有很多其他场景，人群计数算法都可以改变行业的运作方式：

· 计算参加体育赛事的人数

· 估计参加就职典礼或游行的人数

· 监测人流量大的区域

· 协助人员配置和资源分配

了解用于人群计数的不同计算机视觉技术

从广义上讲，目前我们可以使用四种方法来计算人群中的人数：

1. 基于检测的方法

在这里，我们使用移动的窗口式探测器来识别图像中的人并计算其中的人数。此检测方法需要训练有素、能够提取低级特征的分类器。虽然这些方法在人脸检测方面效果很好，但在拥挤的图像上表现不佳，因为大多数目标对象不清晰可见。

2. 基于回归的方法

上述方法无法用于提取低级功能，这种情况下基于回归的方法是最佳选择。我们首先把图像裁剪成几块，然后为每块图像提取低级特征。

3. 基于密度估计的方法

首先为对象创建一个密度图。然后，该算法将识别提取的特征与其对象密度图之间的线性映射。我们也可以使用随机森林回归来学习非线性映射。

4. 基于卷积神经网络（CNN）的方法

使用这种方法时，我们不用将图像裁剪成几块来观察，而是使用卷积神经网络建立一种端到端的回归方法。此方法将整个图像作为输入，并直接计算人群数目。卷积神经网络可以很好地处理回归或分类任务，还能生成密度图。

CSRNet是我们将在本文中运用的一种技术，它可以创建更深的卷积神经网络，用于捕获高级特征并生成高质量的密度图，而不会增加网络复杂性。在讲编码部分之前，让我们先了解什么是CSRNet。

了解CSRNet的体系结构和训练方法

CSRNet以VGG-16（http://www.robots.ox.ac.uk/~vgg/）为前端，因为它具有强大的转移学习能力。VGG的输出大小是原始输入大小的1/8。CSRNet在后端也使用了扩张的卷积层。

但什么是扩张卷积呢？请看下图：

使用扩张卷积的基本概念是在不增加参数的情况下扩大内核。所以，如果扩张率为1，我们能将选取的内核卷积到整个图像上。然而，如果我们将扩张率增加到2，内核就会扩展，如上图所示（遵循每个图像下面的标签）。这样内核就可以作为池化层的替代品。

基础数学理论（选学）

我们可以花点时间解释一下基础数学理论的原理。请注意，这不是在Python中执行算法所必需的，但当你需要调整或修改模型时，这一定会派上用场。

假设输入x（m，n），滤波器为w（i，j），扩张率为r，则输出y（m，n）为：

我们可以用扩张率为r的（k*k）核来推广这个方程。核将扩张到：

([k (k-1)*(r-1)] * [k (k-1)*(r-1)])

因此，每幅图像都产生了真值。用高斯内核把给定图像中的每个人的头部变模糊。所有图像都被裁剪成9个小块，每块的大小是原始图像大小的1/4。

将前4个小块均匀裁剪，其余5个小块随机裁剪。最后，每个小块的镜像用于加倍训练集。

简而言之，这就是CSRNet的体系结构。接下来，我们将查看它的训练细节，包括它使用的评估指标。

随机梯度下降作为端到端结构，用于训练CSRNet。在训练期间，固定学习率设置为1e-6。为了测量真值与估计密度图之间的差异，将损失函数认定为欧氏距离。表示为：

其中N是训练块的数量。CSRNet中使用的评估度量是mae和MSE，即平均绝对误差和均方误差。可由下式得到：

这里，Ci是估计值：

L和W是预测密度图的宽度。

我们的模型将首先预测给定图像的密度图。如果图像中没有人，像素将为0。如果图像中有人，则像素将与人数对应，得出特定的预测值。因此，计算与人相对应的总像素值则可以得出该图像中的人数。

构建自己的人群计数模型

我们将用上海科技数据集执行CSRNet。数据集中有1198张注释图片，图片上共计330165人。

(数据集：https://www.dropbox.com/s/fipgjqxl7uj8hd5/ShanghaiTech.zip?dl=0。)

使用以下代码块复制CSRNet-pytorch存储库。里面包含用于创建数据集、训练模型和验证结果的完整代码：

git clone https://github.com/leeyeehoo/CSRNet-pytorch.git

请在继续操作之前安装CUDA和PyTorch。这些是我们将在下面使用的代码所必须的。

现在，将数据集移动到上面复制的存储库中并解压。然后我们需要创建真值。make_dataset.ipynb文件为我们提供了极大的帮助。我们只需要在笔记本电脑上做一些小改动：

# importing libraries

import h5py

import scipy.io as io

import PIL.Image as Image

import numpy as np

import os

import glob

from matplotlib import pyplot as plt

from scipy.ndimage.filters import gaussian_filter

import scipy

import json

from matplotlib import cm as CM

from image import *

from model import CSRNet

import torch

from tqdm import tqdm

%matplotlib inline

# function to create density maps for images

def gaussian_filter_density(gt):

print (gt.shape)

density = np.zeros(gt.shape, dtype=np.float32)

gt_count = np.count_nonzero(gt)

if gt_count == 0:

return density

pts = np.array(list(zip(np.nonzero(gt)[1], np.nonzero(gt)[0])))

leafsize = 2048

# build kdtree

tree = scipy.spatial.KDTree(pts.copy(), leafsize=leafsize)

# query kdtree

distances, locations = tree.query(pts, k=4)

print ('generate density...')

for i, pt in enumerate(pts):

pt2d = np.zeros(gt.shape, dtype=np.float32)

pt2d[pt[1],pt[0]] = 1.

if gt_count > 1:

sigma = (distances[i][1] distances[i][2] distances[i][3])*0.1

else:

sigma = np.average(np.array(gt.shape))/2./2. #case: 1 point

density = scipy.ndimage.filters.gaussian_filter(pt2d, sigma, mode='constant')

print ('done.')

return density

现在，让我们为part A和part B中的图像生成真值：

part_A_train = os.path.join(root,'part_A/train_data','images')

part_A_test = os.path.join(root,'part_A/test_data','images')

part_B_train = os.path.join(root,'part_B/train_data','images')

part_B_test = os.path.join(root,'part_B/test_data','images')

path_sets = [part_A_train,part_A_test]

img_paths = []

for path in path_sets:

for img_path in glob.glob(os.path.join(path, '*.jpg')):

img_paths.append(img_path)

for img_path in img_paths:

print (img_path)

mat = io.loadmat(img_path.replace('.jpg','.mat').replace('images','ground-truth').replace('IMG_','GT_IMG_'))

img= plt.imread(img_path)

k = np.zeros((img.shape[0],img.shape[1]))

gt = mat["image_info"][0,0][0,0][0]

for i in range(0,len(gt)):

if int(gt[i][1])<img.shape[0] and int(gt[i][0])<img.shape[1]:

k[int(gt[i][1]),int(gt[i][0])]=1

k = gaussian_filter_density(k)

with h5py.File(img_path.replace('.jpg','.h5').replace('images','ground-truth'), 'w') as hf:

hf['density'] = k

到目前为止，我们已经为part_A中的图像生成了真值。我们将对part_B图像执行相同的操作。但在此之前，让我们先看一个示例图像并绘制其真值图：

plt.imshow(Image.open(img_paths[0]))

越来越有趣了！

gt_file = h5py.File(img_paths[0].replace('.jpg','.h5').replace('images','ground-truth'),'r')

groundtruth = np.asarray(gt_file['density'])

plt.imshow(groundtruth,cmap=CM.jet)

让我们计算这张图片中有多少人。

np.sum(groundtruth)

270.32568

同样，为part_B生成真值：

path_sets = [part_B_train,part_B_test]

img_paths = []

for path in path_sets:

for img_path in glob.glob(os.path.join(path, '*.jpg')):

img_paths.append(img_path)

# creating density map for part_b images

for img_path in img_paths:

print (img_path)

mat = io.loadmat(img_path.replace('.jpg','.mat').replace('images','ground-truth').replace('IMG_','GT_IMG_'))

img= plt.imread(img_path)

k = np.zeros((img.shape[0],img.shape[1]))

gt = mat["image_info"][0,0][0,0][0]

for i in range(0,len(gt)):

if int(gt[i][1])<img.shape[0] and int(gt[i][0])<img.shape[1]:

k[int(gt[i][1]),int(gt[i][0])]=1

k = gaussian_filter_density(k)

with h5py.File(img_path.replace('.jpg','.h5').replace('images','ground-truth'), 'w') as hf:

hf['density'] = k

现在，我们已经有了图像以及它们对应的真值。是时候训练模型了！

将使用复制目录中提供的.json文件。只需要更改json文件中图像的位置。为此，请打开.json文件，并将当前位置替换为图像所在的位置。

请注意，所有这些代码都是用Python 2编写的。如果你正在使用任何其他Python版本，请进行以下更改：

1. 在model.py中，将第18行中的xrange更改为range

2. 在model.py中，将第19行更改为：

list(self.frontend.state_dict().items())[i][1].data[:] = list(mod.state_dict().items())[i][1].data[:]

3. 在image.py中，将ground_truth替换为ground-true

改好了吗？现在，打开一个新的终端窗口并输入以下命令：

cd CSRNet-pytorch

python train.py part_A_train.json part_A_val.json 0 0

你可以减少train.py文件中的epoch数，以加快进程。

最后，检测此模型在不可视数据上的表现情况。我们将使用val.ipynb文件来验证结果。请记住将路径更改为预训练权值和图像。

#importing libraries

import h5py

import scipy.io as io

import PIL.Image as Image

import numpy as np

import os

import glob

from matplotlib import pyplot as plt

from scipy.ndimage.filters import gaussian_filter

import scipy

import json

import torchvision.transforms.functional as F

from matplotlib import cm as CM

from image import *

from model import CSRNet

import torch

%matplotlib inline

from torchvision import datasets, transforms

transform=transforms.Compose([

transforms.ToTensor(),transforms.Normalize(mean=[0.485, 0.456, 0.406],

std=[0.229, 0.224, 0.225]),

])

#defining the location of dataset

root = '/home/pulkit/CSRNet/ShanghaiTech/CSRNet-pytorch/'

part_A_train = os.path.join(root,'part_A/train_data','images')

part_A_test = os.path.join(root,'part_A/test_data','images')

part_B_train = os.path.join(root,'part_B/train_data','images')

part_B_test = os.path.join(root,'part_B/test_data','images')

path_sets = [part_A_test]

#defining the image path

img_paths = []

for path in path_sets:

for img_path in glob.glob(os.path.join(path, '*.jpg')):

img_paths.append(img_path)

model = CSRNet()

#defining the model

model = model.cuda()

#loading the trained weights

checkpoint = torch.load('part_A/0model_best.pth.tar')

model.load_state_dict(checkpoint['state_dict'])

检查测试图像上的MAE（平均绝对误差），评估我们的模型：

mae = 0

for i in tqdm(range(len(img_paths))):

img = transform(Image.open(img_paths[i]).convert('RGB')).cuda()

gt_file = h5py.File(img_paths[i].replace('.jpg','.h5').replace('images','ground-truth'),'r')

groundtruth = np.asarray(gt_file['density'])

output = model(img.unsqueeze(0))

mae = abs(output.detach().cpu().sum().numpy()-np.sum(groundtruth))

print (mae/len(img_paths))

得到的平均绝对误差值为75.69，这个结果很不错。现在让我们检查单个图像上的预测值：

from matplotlib import cm as c

img = transform(Image.open('part_A/test_data/images/IMG_100.jpg').convert('RGB')).cuda()

output = model(img.unsqueeze(0))

print("Predicted Count : ",int(output.detach().cpu().sum().numpy()))

temp = np.asarray(output.detach().cpu().reshape(output.detach().cpu().shape[2],output.detach().cpu().shape[3]))

plt.imshow(temp,cmap = c.jet)

plt.show()

temp = h5py.File('part_A/test_data/ground-truth/IMG_100.h5', 'r')

temp_1 = np.asarray(temp['density'])

plt.imshow(temp_1,cmap = c.jet)

print("Original Count : ",int(np.sum(temp_1)) 1)

plt.show()

print("Original Image")

plt.imshow(plt.imread('part_A/test_data/images/IMG_100.jpg'))

plt.show()

哇，图像中本来有382人，我们建立的模型的预测结果是384人。预测结果如此接近，令人震惊！

查看全文