PRE2016 4 Groep3
Group members
Student ID | Name |
0900940 | Ryan van Mastrigt |
0891024 | René Verhoef |
0854765 | Liselotte van Wissen |
0944862 | Sjanne Zeijlemaker |
0980963 | Michalina Tataj |
Introduction
Problem description
To better define our problem which our model should solve we look at the context in which the model would be used. The context in which we will first analyze and develop our model is airport security.
- Airport
In an airport several aspects form our problem: In this context our model should be able to detect various illegal/endangering activities such as (preparation of) violence, acts of terrorism, smuggling and stealing before they have a chance to occur. The model will determine whether (a) person(s) is/are acting suspicious by measuring the biometrics during walking and motion patterns, as these can be used to deduct a person’s mental state, like anxiousness[1]. The model should then report to security personnel to clearly inform them of the situation & the suspected activity and initiate further investigation.
A major problem in an airport is crowds, because of this it is seldom possible to capture the motions of the lower body of a person and therefore we are restricted to measuring the biometrics of the upper body, which in some cases is not possible either. Another option would be to queue people and check them one-by-one, similar to as with customs, however this might influence the person’s mental state: A malevolent person might be better prepared because he is aware he is being evaluated, or a normal person might be more anxious due to guilty ideas of reference.[2] This could result in more false positives and false negatives. The former option is less likely to disturb the social environment of an airport and provide a higher sense of security to both visitors and personnel.
To solve the crowd problem the best solution would be to setup cameras for a top down view, or at an angle not deviating far from this. However, a top down view would cover less area than current standard views meaning a higher cost. Another problem of the state-of-the-art airport security is that it heavily relies on human security guards to detect suspicious behaviour, this introduces a cultural and racial bias. A computer model measuring the biometrics of a person is unlikely to be introduced to this bias and is more objective in its detection of suspicious behaviour.
Definitions
- Abnormal behaviour
- [explanation]
- Biometrics
- [explanation]
Objectives (/ TO DO list)
Goal: Develop a model for a video-based abnormal behaviour detection program
Objectives of this project:
- Formulate concrete problem statement
- Develop overview of the State-of-the-Art
- List of possible biometrics for detecting abnormal/suspicious behaviour
- List of different methods available for measuring biometrics (pros/cons, what method works best for what purpose/setting)
- Current areas of research
- Problems with current technologies
- Develop model scenarios for determining abnormal behavior
- Determine what constitutes abnormal behavior (heavily dependent on context)
- Determine what scenarios should be looked at (airports, sports stadia, banks)
- What techniques could be used (pros/cons, possible new ideas)
- Develop USE aspects
- Users:
- Develop easy-to-understand graphical interface for primary users
- Maintain sense of participation in primary users
- Conduct survey among general public to research support of such an application and to probe stance on privacy vs security
- Incorporate findings into design
- Society:
- Look into societal advantages (decreased criminal/terrorist activity, global sense of security, decrease in racial/religious tensions)
- Look into societal disadvantages (decrease in (perceived) privacy)
- Incorporate findings into design
- Enterprise:
- Make sure model is economically feasible and can compete with current systems
- Look into advantages/disadvantages for enterprises
- Incorporate findings into design
- Users:
- Finalize actual model design
- Create final presentation
The main goal of the model is to provide a general structure of a program which is capable of identifying suspicious persons for security applications. The method should be based on biometrics which can be used to determine abnormal behavior in order to obtain a higher success rate than comparable human-based surveillance.
The objectives of the model are:
- Technical objectives:
- Decrease false-negative rate compared to human-based surveillance
- Decrease false-positive rate compared to human-based surveillance
- Provide results to primary user(s) (security guards/police)
- USE objectives:
- Users:
- Provide easy-to-understand information to primary user
- Provide a higher sense of security (secondary user)
- Society:
- Decrease terrorist activity
- Higher global sense of security
- Higher crime prevention
- Decrease racial/religious tensions
- Enterprise:
- Create a system which is better than current systems, in order to sell to users
- Be economically feasible
- Decrease damage caused to assets (such as buildings) and maintain company reputation
- Users:
State-of-the-art
Possible biometrics for detecting abnormal behavior in crowds
In order to be able to detect abnormal behavior certain characteristics are required in order to identify agents in a a scene. Such characteristics are based on either physiological or behavioral characteristics and are generally referred to as biometrics. In order to asses the biometric the following conditions can be used:[3]
- Universality (every person in the scene should posses the trait)
- Uniqueness (it has to be sufficiently unique so that the agents can be distinguished between one another)
- Permanence (the trait should not vary too much over time)
- Measurability (the trait should be relatively easy to measure)
- Performance (relates to the speed, accurateness and robustness of the technology used)
- Acceptability (the subjects should be accepting towards the technology used)
- Circumvention (It should not be easy to imitate the metric)
Most research on identifying behavior via computer vision techniques are focused on non-crowded situations. The subject is either isolated or only a very small number of people are present. However, most of the conventional computer vision methods are not appropriate for use in crowded areas. This is partly due to the fact that people display different behavior in crowd context. As a result, some individual characteristics can no longer be used, but new collective characteristics of the crowd as a whole now emerge. Another big factor is the difficulty of identifying and tracking individuals in a crowd context. This is mostly due to occlusion of (parts of) the subject(s) by objects or other agents. The quality of the video image and the increased processing power needed to track individuals are also important factors.
[4]
Most current research on detecting abnormal behavior in crowds focuses on tracking of the people in the crowds. The individual tracking of people has proven to be difficult in a crowd context. Many different methods have been proposed for individual tracking and while these tend to work satisfactory for low to moderately crowded situations, they tend to fall flat in higher density crowds. There are also models which try to use general crowd characteristics to detect anomalies, but these tend to ignore singular abnormalities and are better suited for detecting general locations in the scene which contain anomalies, for example where a fire has broken out.[4]
There are promising models that try to combine a bit of both extremes. There is a model which uses a set of low-level motion features to form trajectories of the people in the crowd, but uses an additional rule-set computed based on the longest common sub-sequences [5]. This results in a system that is capable of highlighting individual movements not coherent with the dominant flow. Another paper created an unsupervised learning framework to model activities and interactions in crowded and complicated scenes [6]. They used three elements: low-level visual elements, "atomic" activities, the most fundamental of actions which can not be further divided in sub-activities, and interactions. This model was capable of completing challenging visual surveillance tasks such as determining abnormalities.
Common problems in crowded scenes, such as occlusion of the subjects, can be prevented by moving to a multi-camera surveillance system. Having different angles of the same scene available allows the system to better identify and track subjects. Dynamic cameras (cameras able to turn and zoom in and out) should be able to increase the efficiency of identifying suspicious persons by for example zooming on on the area. However, the use of multiple cameras brings new problems with it. It is difficult to calibrate camera view with significant overlap and to compute their topology. Calibrating camera views which are disjoint and where objects move on multiple ground planes has proven to be challenging. Most research on video surveillance assume a single-camera view, even though multiple-camera surveillance systems can better solve occlusions and scene clutters. Most research on multi-camera systems are based on small-camera networks.[7]
Detecting human activity
In order to recognize human activity, a general system is used which divides human activity recognition in three levels. The low-level represents the core technology, meaning the technical aspects for recognizing humans in a scene. The mid-level represents the actual human recognition systems. The high-level represents the recognized results applied in an environment, for example a surveillance environment.
The low-level contains three main processing stages: object segmentation, feature extraction and representation, and activity detection and classification algorithms. Object segmentation is performed on each frame in the video sequence to detect humans in the scene. The segmentation can be divided into two types based on the mobility of the camera used. In case of a static camera, the most used segmentation method is background subtraction. In background subtraction, the background image without any foreground object(s) is first established. The current image can then be subtracted from the background image to obtain the foreground objects. However, this process is highly sensitive to illumination changes and background changes. Other more complex methods are based on complex statistical models or on tracking. For dynamic cameras the background is constantly changing. The most commonly used segregation method is than temporal difference, the difference between consecutive image frames. It is also possible to transform the coordinate system of the moving camera based on the pixel-level motion between two images in the video.
The second stage of the low-level looks at the characteristics of the segmented objects and represents them in some sort of features. These features can generally be categorized in four groups: space-time information, frequency transform, local descriptors and body modeling. Different methods are used for the different categories. The classification algorithm is based on the available set of suitable feature representations.
The actual mid-level abnormal activity recognition generally relies on a deviation approach. Explicitly defining abnormal behavior depends heavily on context and surrounding environments. These types of behaviors are, by definition, not frequently observed. Thus most models use a reference model, as in the case of background subtraction, based on examples or previously seen data, and consider new observations as abnormal if they deviate from the trained model. Different methods are used. The last level, high-level, represents the actual application. The application is dependent on the environment of the system. This research focusses on surveillance environments. In surveillance systems, human activity recognition is mostly focused on automatically tracking individuals and crowds in order to support security personnel. These types of environments tend to have multiple cameras, which can be used together as a network-connected visual system. The cameras can than track the position and velocity path for each subject. The tracking results can then be used to detect suspicious behavior.[8]
Suspicious behaviour
In order to teach our software to recognise suspicious activity we must first determine what constitutes such behaviour. Trying to remain inconspicuous while conducting a highly suspicious action results in a behavioural paradox that can be difficult to detect by bystanders. However, there are some general patterns in body language and motion that are observed significantly more frequently in individuals with criminal intentions. In our research we will distinguish between two types of non-verbal cues: motion of the body itself and motion of the individual through a crowd. With ‘body’ we refer to the torso, head and arms, because these are mostly visible in a crowd, while the lower body is not. Facial expressions are outside the scope of our project and will not be taken into account.
Body language
Body language of people with criminal intent tends to differ from that of bystanders, because they need to remain undetected [1]. Frey [9], among others, showed that people recognise this deception far more often than can be accounted for by chance.
During the build-up phase of a crime, offenders often show an increased frequency of object- and self-adaptors, in other words, the “manipulation of objects without instrumental goal” [10] and the frequency and duration of contact between the hand and the own body [11]. This includes touching and scratching of the own hair and face [12] and strictly unnecessary contact with carried objects, such as tapping pens repeatedly or reaching for an object multiple times without using it. This behaviour was observed in both assassination and bomb-planting scenarios in large crowds, indicating that it is likely not crime-specific [13] [14].
Troscianko et al. [15] observed that head orientation could also give away suspicious intentions. Offenders look away from their walking direction more often and look around repeatedly. However, one should be careful when considering these signs, as airports are vast and crowded, which often results in passengers getting confused or lost. Their searching behaviour could result in similar head movement, while they have no harmful intentions.
Therefore, in addition to the cues itself, a reliable method is needed to differentiate between real cues and normal behaviour. One way to do this is by measuring behaviour relatively to the crowd. To ensure that one wrong gesture does not lead to a false positive, a baseline is established first. The frequency of suspicious behavioural cues is measured in the crowd overall to determine what should be regarded as ‘normal’ behaviour [16]. Only distinct deviations from this baseline are considered suspicious.
The recognition of body language does not rely on perfect information and vision. Experiments were conducted with recognising human behaviour based on point light animation footage. It was observed that humans still can pick up behavioural cues with this limited visual information [17]. This supports the idea that computer software will be able to pick up behavioural cues, despite its visual limitations.
Motion patterns
Criminal intentions do not only show though a person’s body movement, but also in the way they move through a crowd. In general, an offender will show a significantly more abrupt kind of movement during the build-up phase of a crime. There are more changes in speed, position and direction than in a general crowd [1].
However, it is important to keep in mind that these movement patterns should be observed within the relevant context. For example, in an airport, changes in speed and direction could also indicate searching behaviour. It is therefore import to study movement that deviates from the rest of the crowd, rather than universal ‘suspicious’ movement.
All cues, both for motion and body language, were found to be positive deviations, i.e. the behaviour was expressed more strongly by the culprit than by the bystanders [1]. This is a useful property for our project, as it is easier to spot the deviating behaviour of one individual than to find a behaviour that occurs in the overall crowd, but not in one suspicious individual.
Approach
Planning
In order to keep track of the progress of the project and set deadlines for our goals we have made a Gantt chart. This chart shows what tasks are done during what time and how these tasks are divided among our resources.
Milestones
We consider several milestones based on the tasks that lay before us as can be seen in the Gantt chart in the Planning section:
- Finished the research into what defines abnormal Behaviour. (planned by the end of week 2)
- Finished the research into the existing methods for biometric scanning. (planned by the end of week 3)
- Finished analysing the USE aspects that our project brings with it. (planned by the end of week 3)
- Having developed a model for the detection of abnormal behaviour based on previous research and analyses. (planned by the end of week 6)
- Holding the final presentation presenting our product.(planned by the end of week 8)
- Finalized the wiki for judging. (planned by the end of week 8)
Deliverables
At the end of the project we aim to produce the following deliverables:
- A software model of a biometric scanner that detects suspicious behaviour
- Full documentation of the development and research process on this wiki
- A final presentation explaining said model and process
- A peer review of all group members
USE aspects
User
Primary users
- Security Guards
- Police officers
- Military personnel
Secondary users
- Persons being filmed
- Airline companies
- Technical maintenance personnel
Tertiary users
- The people manufacturing and designing the product
- Gouvernement
User friendliness
User friendliness can be described using the following factors:
- Learnability:
In this system there is not much the user should learn, the only thing the user (airport security) should learn is how the system let's them know when a suspicious person is detected and who/where this person is.
- Efficiency:
Once the system is in use a higher level of efficiency will be reached since probably more criminals will be detected and less harmless people will be checked by security.
- Memorability:
Since there is not a lot to learn for the user, it will be possible to use the system even after not using it for a longer pireod of time.
- Errors:
It is important to once the system is realised it is severly tested in the errorrate. If it would make more incorrect detections than a human securtity guard would than there would be no use for the system. For the effectiveness ofthe product it is importent to keep te errorrate of the system really low.
- Satisfaction
With the queations of the survey below a general idea of what the users are looking for in the system can be established and interpeted. but since the survey will be held by a very small group it is not to say for sure the results are representative. Of course if the system would be realised there still tests thet need to be executed wih the users to see if they like it and what to change to suit their needs optimally.
Sense of participation
Public survey
In order to gain insight in the current methods of detecting suspicious persons and to look at the wishes of the primary users of the system, a survey will be done with security personnel in airports. The questions we would like to ask are:
- What do you look for identifying suspicious persons?
- Do the signs you look depend on the criminal activity?
- Do you look at body language specifically?
- Do you rely on facial recognition (wanted list)?
- Do you look at abnormal movement through a crowd for identifying suspicious persons?
- What actions do you take after identifying a suspicious person?
- How often does it occur that an apprehended person turns out innocent? (percentage wise)
- How many people are present in the departure hall during peak hours?
- Would you trust a system which detects suspicious persons automatically?
- How would you prefer the information to be presented to you by the system?
- What would you like to see in a detection program?
Society
Advantages
For a System to work and be accepted in society it should have a lot of sociental advantages. The biggest advantages and thus the reasons to design this system will be listed below.
Terrorist/criminal activity
The first advantage is also the main goal of the system. To detect criminal activity at airports. By using cameras and algorithms to detect movements linked to nervousness, and movents labled as 'suspicious' potential terrorists and/or smuglers etc. can be caught. By useing this system the process will be more efficient since it can analyse every person walking in the airport. This system will thus be responsible for a higher criminal catching rate en reduce the chances of terrorists attacks.
Security
As said above using the system should result in a higher criminal and terrorists catching rate making the airport and flying safer. The system will not act as a replacement of the security at an airport, but will be an aid to help the securtiy be more accurate, select/check less hermless people, and more effective, possible to check every person entering the airport, in finding criminals. Since the system is only capable of detecting suspicious persons the securtity/police will still have to the check for prove and if neccesary the arrest.
Racial/religious tensions
The last advantage this sytem will provide is it's objectiveness. Since the system will be 'scanning' persons based on their behaviour and movements the outer appearance of the person is not taken into account when determmining if a person could be dangarous or not. With persons/security/police detecting suspisous persons their will always be some part of bias, since it is humanly impossible to be completely unbias. Also currently selecting people is also partyally based upon profiling, by looking for external characteristics convicted criminals have in common and based on those external charasteristics search for people who also have these charateristics because it would than be more likely for them to be a criminal. This is a self induced system. If for example 70% of convicted criminals would wear blue nail polish, people wearing blue nail polish would be checked upon for more often than people who don't, leading to more findings and more arrests of people with blue nail polish and thus keeping up this profile. Currently there are a lot of discussions about this fenomenon, because it is claimed to happen upon characteristics such as race and religion. This leads to a lot of tension between different races and religions within a country but also world wide. By using the system this (racial) profiling can be extermintated since it is not based upon a database of suspicious external charateristics but upon behaviour. A lot of our behaviour and body language is unconsiouly so by using psychological research this unconsious behaviour can be analysed and liked to certain feelings and acts, such as nervousness and lying.
Disadvantages
Besides the advantages their should also be looked closely to the disadvantages it might have for the society.
Privacy concerns
The biggest disadvantage people will propably bring up will be the privacy invasion. Knowing that the moment you walk into the airport you're beging filmed and watched will cause the issue that other people will know where you are, when you leave the country an where you're going to. In the privacy vs. security debate there are three questions that need to be awnsered to determine is the advantages outweight the disadvantages. (Brey,2004) these three questions are: How much added security results from the system? How invasive to privacy is the technology, as can be judged from both public response and scholarly arguments? Are there reasonable alternatives to the technology that may yield similar security results without the privacy concerns?
the first question cannot be awnsered yet, because we can't yet measure if and how much more criminals will be caught with the system, therefor it should first be build and testes before qe can get these results. For the second question an awnser could be that is doesn't change much to privacy since at an airport there now already are security camera's so people already are being filmed, they are just not being analysed by an algorith yet. To get more insight in how the public thinks about this a survey could be held. the anwser to the third question is also debatable because of the word reasonable. Because the method used now, analyzig by security guards, can be seen as an resonable alternative since it is also acceptec now. But one could also reson ther isn't a resonable alternative because no people could be completly without bias. And the system will be more effective than the current method, althought as stated in the first question it can not be determined how mich more effective it will be.
Errors
Another disadvantage that should be taken into account is the errorrate of the system. Since it is very hard to test the system in simulation situations, unconsious behaviour is hard to simulate/act. So the real errorrate could only be detemined when tested in real life, this could be risky is it than turns out to have a large errorrate by either selecting a lot of harmless people, or by not selecting potential criminals.terrorist. The first type of error doesn't bring much risk, it is only inconvenient vor the selected travlers. The second type of error could be risky since then criminals and terrorist aren't getting cought and can still cause trouble. A way to reduce this risk and handle this disadvantage is to when it still needs to be tested the reagular security as it is done right now will run simulatiously. This way during the testing it will be atleast as save as it is right now.
Enterprise
Feasibility
Advantages
Disadvantages
Model
[explanation of concept/pseudocode] [link to actual code?]
Results
References
- ↑ 1.0 1.1 1.2 1.3 Koller, C. I., Wetter, O. E., & Hofer, F. (2015). What is suspicious when trying to be inconspicuous? Criminal intentions inferred from nonverbal behavioral cues. Perception, 44(6), 679-708.
- ↑ Stein, G., & Wilkinson, G. (Eds.). (2007). Seminars in general adult psychiatry. RCPsych Publications.
- ↑ Jain, A., Bolle, R., & Pankanti, S. (Eds.). (2006). Biometrics: personal identification in networked society (Vol. 479). Springer Science & Business Media.
- ↑ 4.0 4.1 Junior, J. C. S. J., Musse, S. R., & Jung, C. R. (2010). Crowd analysis using computer vision techniques. IEEE Signal Processing Magazine, 27(5), 66-77.
- ↑ Cheriyadat, A. M., & Radke, R. J. (2008). Detecting dominant motions in dense crowds. IEEE Journal of Selected Topics in Signal Processing, 2(4), 568-581.
- ↑ Wang, X., Ma, X., & Grimson, W. E. L. (2009). Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models. IEEE Transactions on pattern analysis and machine intelligence, 31(3), 539-555.
- ↑ Wang, X. (2013). Intelligent multi-camera video surveillance: A review. Pattern recognition letters, 34(1), 3-19.
- ↑ Ke, S. R., Thuc, H. L. U., Lee, Y. J., Hwang, J. N., Yoo, J. H., & Choi, K. H. (2013). A review on video-based human activity recognition. Computers, 2(2), 88-131.
- ↑ Frey, C. (2014). " Who's the Criminal?": Early Detection of Hidden Criminal Intentions-Influence of Nonverbal Behavioral Cues, Theoretical Knowledge, and Professional Experience (Doctoral dissertation).
- ↑ Burgoon, J. K. (2005). Measuring nonverbal indicators of deceit. The sourcebook of nonverbal measures: Going beyond words, 237-250.
- ↑ Sporer, S. L., & Schwandt, B. (2007). Moderators of nonverbal indicators of deception: A meta-analytic synthesis.
- ↑ Vrij, A. (2008). Detecting lies and deceit: Pitfalls and opportunities. (Vol. 13, 2nd ed.). John Wiley & Sons.
- ↑ Heubrock, D., Kindermann, S., Palkies, P., & Röhrs, A. (2009). Die Fähigkeit zur Identifikation von Attentätern im öffentlichen Raum. Polizei&Wissenschaft, 2(2009), 2-11.
- ↑ Heubrock, D. (2011). Möglichkeiten der polizeilichen Verhaltensanalyse zur Identifikation muslimischer Kofferbomben-Attentäter.[Possibilities of behavior analysis for identifying muslimic suitcase bombers.]. Polizei-Heute, 11(2), 13-24.
- ↑ Troscianko, T., Holmes, A., Stillman, J., Mirmehdi, M., Wright, D., & Wilson, A. (2004). What happens next? The predictability of natural behaviour viewed through CCTV cameras. Perception, 33(1), 87-101.
- ↑ Frank, M. G., Maccario, C. J., & Govindaraju, V. (2009). Protecting Airline Passengers in the Age of Terrorism. ABC-CLIO, Santa Barbara.
- ↑ Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception & psychophysics, 14(2), 201-211.
Minutes
26-04-2017
The subject of the project has been chosen and the deliverables and objectives (as found on the wiki) have been determined.
30-04-2017
- Orientary research has been performed to develop a better understanding of the subject and better define our goals.
- A planning and milestones have been determined (see the Approach section)
- A wiki page has been created, including a template for the documentation with the already available information filled in.
03-05-2017
We have agreed upon a list of questions to ask the security officer at Veldhoven. Research of behavioural cues and biometric scanners has been discussed and is still ongoing. Several sections of the wiki, including the planning and charts, were updated and given a more structured layout.