PRE2016 4 Groep3
Group members
Student ID | Name |
0900940 | Ryan van Mastrigt |
0891024 | René Verhoef |
0854765 | Lisselotte van Wissen |
0944862 | Sjanne Zeijlemaker |
0980963 | Michalina Tataj |
Introduction
Problem description
To better define our problem which our model should solve we look at the context in which the model would be used. The context in which we will first analyze and develop our model is airport security.
- Airport
In an airport several aspects form our problem: In this context our model should be able to detect various illegal/endangering activities such as (preparation of) violence, acts of terrorism, smuggling and stealing before they have a chance to occur. The model will determine whether (a) person(s) is/are acting suspicious by measuring the biometrics during walking and motion patterns, as these can be used to deduct a person’s mental state, like anxiousness[1]. The model should then report to security personnel to clearly inform them of the situation & the suspected activity and initiate further investigation.
A major problem in an airport is crowds, because of this it is seldom possible to capture the motions of the lower body of a person and therefore we are restricted to measuring the biometrics of the upper body, which in some cases is not possible either. Another option would be to que people and check them one-by-one, similar to as with customs, however this might influence the person’s mental state: A malevolent person might be better prepared because he is aware he is being evaluated, or a normal person might be more anxious due to guilty ideas of reference.[2] This could result in more false positives and false negatives. The former option is less likely to disturb the social environment of an airport and provide a higher sense of security to both visitors and personnel.
To solve the crowd problem the best solution would be to setup cameras for a top down view, or at an angle not deviating far from this. However, a top down view would cover less area than current standard views meaning a higher cost. Another problem of the state-of-the-art airport security is that it heavily relies on human security guards to detect suspicious behaviour, this introduces a cultural and racial bias. A computer model measuring the biometrics of a person is unlikely to be introduced to this bias and is more objective in its detection of suspicious behaviour.
Definitions
- Abnormal behaviour
- [explanation]
- Biometrics
- [explanation]
Objectives (/ TO DO list)
Goal: Develop a model for a video-based abnormal behaviour detection program
Objectives of this project:
- Formulate concrete problem statement
- Develop overview of the State-of-the-Art
- List of possible biometrics for detecting abnormal/suspicious behaviour
- List of different methods available for measuring biometrics (pros/cons, what method works best for what purpose/setting)
- Current areas of research
- Problems with current technologies
- Develop model scenarios for determining abnormal behavior
- Determine what constitutes abnormal behavior (heavily dependent on context)
- Determine what scenarios should be looked at (airports, sports stadia, banks)
- What techniques could be used (pros/cons, possible new ideas)
- Develop USE aspects
- Users:
- Develop easy-to-understand graphical interface for primary users
- Maintain sense of participation in primary users
- Conduct survey among general public to research support of such an application and to probe stance on privacy vs security
- Incorporate findings into design
- Society:
- Look into societal advantages (decreased criminal/terrorist activity, global sense of security, decrease in racial/religious tensions)
- Look into societal disadvantages (decrease in (perceived) privacy)
- Incorporate findings into design
- Enterprise:
- Make sure model is economically feasible and can compete with current systems
- Look into advantages/disadvantages for enterprises
- Incorporate findings into design
- Users:
- Finalize actual model design
- Create final presentation
The main goal of the model is to provide a general structure of a program which is capable of identifying suspicious persons for security applications. The method should be based on biometrics which can be used to determine abnormal behavior in order to obtain a higher success rate than comparable human-based surveillance.
The objectives of the model are:
- Technical objectives:
- Decrease false-negative rate compared to human-based surveillance
- Decrease false-positive rate compared to human-based surveillance
- Provide results to primary user(s) (security guards/police)
- USE objectives:
- Users:
- Provide easy-to-understand information to primary user
- Provide a higher sense of security (secondary user)
- Society:
- Decrease terrorist activity
- Higher global sense of security
- Higher crime prevention
- Decrease racial/religious tensions
- Enterprise:
- Create a system which is better than current systems, in order to sell to users
- Be economically feasible
- Decrease damage caused to assets (such as buildings) and maintain company reputation
- Users:
State-of-the-art
Possible biometrics for detecting abnormal behavior in crowds
In order to be able to detect abnormal behavior certain characteristics are required in order to identify agents in a a scene. Such characteristics are based on either physiological or behavioral characteristics and are generally referred to as biometrics. In order to asses the biometric the following conditions can be used:[3]
- Universality (every person in the scene should posses the trait)
- Uniqueness (it has to be sufficiently unique so that the agents can be distinguished between one another)
- Permanence (the trait should not vary too much over time)
- Measurability (the trait should be relatively easy to measure)
- Performance (relates to the speed, accurateness and robustness of the technology used)
- Acceptability (the subjects should be accepting towards the technology used)
- Circumvention (It should not be easy to imitate the metric)
Most research on identifying behavior via computer vision techniques are focused on non-crowded situations. The subject is either isolated or only a very small number of people are present. However, most of the conventional computer vision methods are not appropriate for use in crowded areas. This is partly due to the fact that people display different behavior in crowd context. As a result, some individual characteristics can no longer be used, but new collective characteristics of the crowd as a whole now emerge. Another big factor is the difficulty of identifying and tracking individuals in a crowd context. This is mostly due to occlusion of (parts of) the subject(s) by objects or other agents. The quality of the video image and the increased processing power needed to track individuals are also important factors.
[4]
Most current research focuses on tracking of the people in the crowds. The individual tracking of people has proven to be difficult in a crowd context. Many different methods have been proposed for individual tracking and while these tend to work satisfactory for low to moderately crowded situations, they tend to fall flat in higher density crowds. There are also models which try to use general crowd characteristics to detect anomalies, but these tend to ignore singular abnormalities and are better suited for detecting general locations in the scene which contain anomalies, for example where a fire has broken out.[4]
There are promising models that try to combine a bit of both extremes. There is a model which uses a set of low-level motion features to form trajectories of the people in the crowd, but uses an additional rule-set computed based on the longest common sub-sequences [5]. This results in a system that is capable of highlighting individual movements not coherent with the dominant flow. Another paper created an unsupervised learning framework to model activities and interactions in crowded and complicated scenes [6]. They used three elements: low-level visual elements, "atomic" activities, the most fundamental of actions which can not be further divided in sub-activities, and interactions. This model was capable of completing challenging visual surveillance tasks such as determining abnormalities.
Common problems in crowded scenes, such as occlusion of the subjects, can be prevented by moving to a multi-camera surveillance system. Having different angles of the same scene available allows the system to better identify and track subjects. Dynamic cameras (cameras able to turn and zoom in and out) should be able to increase the efficiency of identifying suspicious persons by for example zooming on on the area. However, the use of multiple cameras brings new problems with it. It is difficult to calibrate camera view with significant overlap and to compute their topology. Calibrating camera views which are disjoint and where objects move on multiple ground planes has proven to be challenging. Most research on video surveillance assume a single-camera view, even though multiple-camera surveillance systems can better solve occlusions and scene clutters. Most research on multi-camera systems are based on small-camera networks.[7]
Detecting human activity
In order to recognize human activity, a general system is used which divides human activity recognition in three levels. The low-level represents the core technology, meaning the technical aspects for recognizing humans in a scene. The mid-level represents the actual human recognition systems. The high-level represents the recognized results applied in an environment, for example a surveillance environment.
The low-level contains three main processing stages: object segmentation, feature extraction and representation, and activity detection and classification algorithms. Object segmentation is performed on each frame in the video sequence to detect humans in the scene. The segmentation can be divided into two types based on the mobility of the camera used. In case of a static camera, the most used segmentation method is background subtraction. In background subtraction, the background image without any foreground object(s) is first established. The current image can then be subtracted from the background image to obtain the foreground objects. However, this process is highly sensitive to illumination changes and background changes. Other more complex methods are based on complex statistical models or on tracking. For dynamic cameras the background is constantly changing. The most commonly used segregation method is than temporal difference, the difference between consecutive image frames. It is also possible to transform the coordinate system of the moving camera based on the pixel-level motion between two images in the video.
The second stage of the low-level looks at the characteristics of the segmented objects and represents them in some sort of features. These features can generally be categorized in four groups: space-time information, frequency transform, local descriptors and body modeling. Different methods are used for the different categories. The classification algorithm is based on the available set of suitable feature representations.
The actual mid-level abnormal activity recognition generally relies on a deviation approach. Explicitly defining abnormal behavior depends heavily on context and surrounding environments. These types of behaviors are, by definition, not frequently observed. Thus most models use a reference model, as in the case of background subtraction, based on examples or previously seen data, and consider new observations as abnormal if they deviate from the trained model. Different methods are used. The last level, high-level, represents the actual application. The application is dependent on the environment of the system. This research focusses on surveillance environments. In surveillance systems, human activity recognition is mostly focused on automatically tracking individuals and crowds in order to support security personnel. These types of environments tend to have multiple cameras, which can be used together as a network-connected visual system. The cameras can than track the position and velocity path for each subject. The tracking results can then be used to detect suspicious behavior.[8]
Approach
Planning
In order to keep track of the progress of the project and set deadlines for our goals we have made a Gantt chart. This chart shows what tasks are done during what time and how these tasks are divided among our resources.
Milestones
We consider several milestones based on the tasks that lay before us as can be seen in the Gantt chart in the Planning section:
- Finished the research into what defines abnormal Behaviour. (planned by the end of week 2)
- Finished the research into the existing methods for biometric scanning. (planned by the end of week 3)
- Finished analysing the USE aspects that our project brings with it. (planned by the end of week 3)
- Having developed a model for the detection of abnormal behaviour based on previous research and analyses. (planned by the end of week 6)
- Holding the final presentation presenting our product.(planned by the end of week 8)
- Finalized the wiki for judging. (planned by the end of week 8)
Deliverables
At the end of the project we aim to produce the following deliverables:
- A software model of a biometric scanner that detects suspicious behaviour
- Full documentation of the development and research process on this wiki
- A final presentation explaining said model and process
- A peer review of all group members
USE aspects
User
Primary users
- Security Guards
- Police officers
- Military personnel
Secondary users
- Persons being filmed
Tertiary users
- The people manufacturing the product
- The management responsible for buying the product
User friendliness
Sense of participation
Public survey
In order to gain insight in the current methods of detecting suspicious persons and to look at the wishes of the primary users of the system, a survey will be done with security personnel in airports. The questions we would like to ask are:
- What do you look for identifying suspicious persons?
- Do the signs you look depend on the criminal activity?
- Do you look at body language specifically?
- Do you rely on facial recognition (wanted list)?
- Do you look at abnormal movement through a crowd for identifying suspicious persons?
- What actions do you take after identifying a suspicious person?
- How often does it occur that an apprehended person turns out innocent? (percentage wise)
- How many people are present in the departure hall during peak hours?
- Would you trust a system which detects suspicious persons automatically?
- How would you prefer the information to be presented to you by the system?
- What would you like to see in a detection program?
Society
Advantages
Terrorist/criminal activity
Security
Racial/religious tensions
Disadvantages
Privacy concerns
Enterprise
Feasibility
Advantages
Disadvantages
Model
[explanation of concept/pseudocode] [link to actual code?]
Results
References
- ↑ Koller, C. I., Wetter, O. E., & Hofer, F. (2015). What is suspicious when trying to be inconspicuous? Criminal intentions inferred from nonverbal behavioral cues. Perception, 44(6), 679-708.
- ↑ Stein, G., & Wilkinson, G. (Eds.). (2007). Seminars in general adult psychiatry. RCPsych Publications.
- ↑ Jain, A., Bolle, R., & Pankanti, S. (Eds.). (2006). Biometrics: personal identification in networked society (Vol. 479). Springer Science & Business Media.
- ↑ 4.0 4.1 Junior, J. C. S. J., Musse, S. R., & Jung, C. R. (2010). Crowd analysis using computer vision techniques. IEEE Signal Processing Magazine, 27(5), 66-77.
- ↑ Cheriyadat, A. M., & Radke, R. J. (2008). Detecting dominant motions in dense crowds. IEEE Journal of Selected Topics in Signal Processing, 2(4), 568-581.
- ↑ Wang, X., Ma, X., & Grimson, W. E. L. (2009). Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models. IEEE Transactions on pattern analysis and machine intelligence, 31(3), 539-555.
- ↑ Wang, X. (2013). Intelligent multi-camera video surveillance: A review. Pattern recognition letters, 34(1), 3-19.
- ↑ Ke, S. R., Thuc, H. L. U., Lee, Y. J., Hwang, J. N., Yoo, J. H., & Choi, K. H. (2013). A review on video-based human activity recognition. Computers, 2(2), 88-131.
Minutes
26-04-2017
The subject of the project has been chosen and the deliverables and objectives (as found on the wiki) have been determined.
30-04-2017
- Orientary research has been performed to develop a better understanding of the subject and better define our goals.
- A planning and milestones have been determined (see the Approach section)
- A wiki page has been created, including a template for the documentation with the already available information filled in.
03-05-2017
We have agreed upon a list of questions to ask the security officer at Veldhoven. Research of behavioural cues and biometric scanners has been discussed and is still ongoing. Several sections of the wiki, including the planning and charts, were updated and given a more structured layout.