Visualizing summaries of performance
for instructors assessing physical-motion skills
Neil C. Rowe, Jeff P. Houde, Rey M. Osoteo, Riqui Schwamm, Cory R. Kirk, and Ahren A. Reed
U.S. Naval Postgraduate School, United States
Saad Khan, Chris Broaddus, and Chris Meng
Sarnoff Laboratories, United States
Abstract: Physical-motion skills are important in physical education, theater, military training, and a wide range of vocational training. Teaching them is difficult because events happen quickly and the view of instructors is often occluded. We are developing training systems using modern surveillance technology that can automatically track and assess students exercising physical-motion skills. We describe two systems we have implemented, one with inexpensive nonimaging sensors and one with multi-camera fusion, to track U.S. Marines during training and assess how well they are performing. The key challenge is visually summarizing the data because there is usually too much for instructors to inspect manually. We describe several visualization methods we provide, including plots of unusual events in time and space broken down by student and type of event, plots of continuous-valued metrics, plots of group centroids, and comparison plots of similar exercises. Results show that interesting phenomena can be seen in the visualizations that are otherwise difficult to recognize.
This paper appeared in the Proceedings of Global Learn Asia Pacific 2011, Melbourne, Australia, April 2011.
Automated surveillance technology has made important progress in recent years due to concerns about terrorism. This technology can be used to track people and note suspicious behavior without human assistance. Traditionally the technology has emphasized cameras and video analysis, but recent work has shown that much can be done with simpler and cheaper sensors such as motion detectors.
Many important skills have a physical-motion component. Physical education (Hay, 2006) and theater arts (Smith-Autard, 2004) prominently feature physical-motion skills. Many vocational skills such as operating machinery require performers to go to different places to accomplish different things. Collaborative group work in a classroom often involves discussions that require participants to move around. Motions in other activities such as studying can be metaphors for the mental activity of the student, and can be deliberately encouraged to enable better tracking, as by asking students to visualize a conflict using their hands and a map.
Instructors can be helpful in teaching physical-motion skills with a range of pedagogical strategies (Coker, 2004). Such skills have long been difficult for instructors to assess, but technology is changing that. The cost of sensor technology has greatly decreased in recent years, making it possible for routine education and training in an area 10 meters by 10 meters for around $1000 US.
We have been exploring surveillance technology for assessment of training of U.S. Marines (a kind of soldier). We report on the visualization methods we have used in two prototype systems we have built, one involving nonimaging sensors for the task of searching a building, and one involving multi-camera tracking for the task of patrolling an urban area. Both are important but challenging because of the variety of simultaneous concerns that Marines must handle. The task for instructors is difficult as well because of the constantly changing locations and angles of view, occlusions by walls, and need of trainees to improvise in response to changing circumstances. Thus automated assessment (grading) could be provide useful feedback to trainees. In military training, feedback is most naturally done in the standard post-exercise assessment or "after-action review" (Hixson, 1995); current assessment predominantly uses checklists for instructors (Hone et al, 2008) which poorly map to their visual experience.
The first thought of instructors in using technology to assess physical-motion skills is to look at video. But video poses many difficulties. Cameras must be placed carefully to obtain a view unobstructed by the students' bodies, walls, and other obstacles. Unobstructed views at times may be impossible when subjects move around considerably, as with team sports. This suggests using multiple cameras, but that is expensive and it is easy for people to become confused in correlating one video stream with another. The main challenge of video, however, is that it provides a considerable amount of data and this is difficult to search. An instructor who wants to locate errors when they are rare (as with students who have progressed beyond the basics) must watch large amounts of uninteresting video. Even when interesting events are found, a view may be insufficiently clear to show to students, who do not have the analytical skills of an instructor and need pictures to be unambiguous.
A better approach is to summarize automatically what happened. Summarizing and indexing of video is difficult in general because of the many kinds of human activity and the many perspectives from which we may view it. However, it can be simplified with two restrictions: We can extract only the overall location and orientation of subjects, and look only for unusual events as predefined. Locating and orienting subjects in video is considerably easier than classifying what they are doing as in (Minnen et al, 2007), and if cost is a factor, can be done with relatively inexpensive non-video sensor hardware. Detection of unusual events can be done by computing statistics of normal activity and comparing observations to it; often it is easier to detect that something unusual has happened than to classify it, but often instructors can figure out what happened once an interesting time is identified to them.
In assessing what happened in a time period for a group of students, the four dimensions are time, space, student, and type of event. "Event" can mean mistakes (like getting too close to someone else) or key transitions (like different periods of a choreography). Displaying all four dimensions together is generally too much information for instructors. (Using three-dimensional displays to encode more information is expensive and hard to make work consistently, and using display time as an additional dimension provides something too similar to video.) Instead we should summarize for instructors and let them "drill down" to find further details they need. A number of similar such systems have been developed for analysis of multidimensional sensor data, e.g. (Morreale et al, 2010) for environmental monitoring. Ultimately, instructors should be able to drill down to the video.
Several kinds of summarization displays are helpful for instructors. The most direct approach is to plot two dimensions chosen from time, space, student, and type of event while encoding a third in a different way such as by color. There are four basic such displays:
· We can plot events over time for different students on parallel timelines to indicate how events tend to bunch up, color-coding by type of event.
· We can plot events over time for different types of event on parallel timelines to indicate how each event tends to occur, color-coding by student.
· We can plot events over space for each student to show the places where events occur, color-coding for type of event
· We can plot events of a particular type for all students over space to show where particular types tend to occur, color-coding by student.
Instructors will need to refer to each of these four kinds at different times, but often one kind will be primary for a particular application. For instance for Marine training, the first kind of display is primary for after-action review, since performance of individuals is usually considered in sequence.
It is also often useful to supplement these displays with plots of direct parameter measurements of students over time. For instance, in choreography it is useful to plot visibility to lines of sight over time, and in team sports it is useful to track speed of the student.
When students work as a team, another useful display focuses on the group of students as a whole. Plotting the centroid (location of their center of mass) and dispersion (deviation of locations from the centroid) can indicate coordination problems and inconsistencies. For instance in team sports, it is useful to measure how well the students cover the field of play. Since the centroid averages the effects of all students, its pattern in time will be less noisy than that of individuals, and more able to indicate fundamental problems such as a group veering too far from their target position.
It is often useful to display comparisons of student performance at corresponding times in different rehearsals or run-throughs, as with theatrical performances or crisis-procedure rehearsal with training of power-plant operators. Comparisons of locations over time or time progression through the exercise can make the problems more obvious than when considered in isolation.
Finally, it is useful to display averages over all students on a particular exercise, all students engaged in a particular event or making a particular mistake, or particular students over all exercises. This allows us to see respectively things like the difficulty of an exercise, the types of students making a mistake, and the relative accomplishments of students.
One system we have built is for automated assessment of Marine performance in searching a building for people and contraband, an urban-warfare skill. It is especially hard for cameras to see what occurs indoors because of occlusion by walls and furniture, and there are additional difficulties with the low levels of light and the lens distortions at short distances. These pose challenges for automated image processing. Unfortunately we cannot attach accelerometers to Marine's bodies during training as in (Chen and Hung, 2009) because that interferes with their ability to move naturally and requires unavailable extra time to set up and remove for every training exercise.
We used instead nonimaging sensors for tracking. High accuracy of localization is not essential, as it is more important to tell whether Marines searched an area at all than to determine their exact paths. Our experience with infrared motion sensors and inexpensive sonars (Rowe et al, 2011) has shown them to be good at detecting people up to 5 meters, which provides good coverage for much indoor terrain. Sensors can also be placed on weapons to detect their orientations, a key concern of Marine instructors. So we built a prototype system using infrared, sonar, light, and audio sensors.
(U.S. Marine Corps, 1998) summarizes Marine doctrine on the searching or "clearing" of a building. Since this is often done in dangerous areas and involves interaction with the public, Marines must be well trained. There are rules for preferred methods of entry, where to point weapons, and how to coordinate with other Marines. Most of these skills are employed in teams of 2-4 Marines. Some of the most important considerations are the following "constraints" which we attempted to address:
c1. Marines should aim their weapons to jointly cover as much as possible of a dangerous area.
c2. Marines should maintain a "stacked" orientation when entering a door with one Marine behind the other.
c3. Marines should thoroughly survey an area they are asked to search.
c4. Marines should identify suspicious objects and leave them for trained personnel.
c5. Marines should communicate important information to the other Marines and civilians.
c6. Marines should avoid pointing their weapons at one another ("flagging").
We used much of the same sensor hardware and software as in our research on monitoring of public places (Rowe et al, 2011). Sensors other than microphones were commercial off-the-shelf products from Phidgets (www.phidgets.com) and used a simple hardware interface and software. We polled them around 40 times per second and downloaded values to a file. Audio was collected using Audacity software running cardioid microphones with preamplifiers attached to USB ports on computers.
Our experiments used eleven laptop computers to run the sensors. Four computers ran just microphones. Two of the other seven were loaded into backpacks and carried by students simulating Marines in a scenario. Four of the remaining five were stationed at table height in the rooms searched, and the sensors other than light sensors were oriented parallel to the floor to avoid gait effects associated with legs and heads. The remaining set of sensors were stationed at ground height and monitored two doors.
The five sets of stationary sensors all had a microphone, an infrared narrow-range sensor, a motion sensor, a light sensor, and a simple sonar sensor. Two sets also had force sensors which were placed under suspicious objects to detect if the objects were picked up. Sensors were oriented in a variety of directions parallel to the floor to provide broad coverage. The sensors on the Marines were orientation sensors for the weapons, sonar sensors for the weapons, and vibration sensors. The orientation sensors had three-axis accelerometers, rotation sensors, and magnetometers. We summed the rotation values over time to estimate orientation angles in azimuth (yaw), inclination (pitch), and roll; our primary concern is azimuth (where the weapon is aimed horizontally), secondarily inclination (the degree to which the weapon is horizontal). Use of Runge-Kutta integration improved the accuracy of these estimates. Experiments showed that yaws were accurate within ten degrees over a five-minute period. For further accuracy, the magnetometer values can correct for the accumulated error over time; although indoors we found significant deviations from compass north, these could be learned from experiments.
We devised 14 "building clearing" scenarios in which the two Marines cleared two rooms and a corridor. Each scenario was done as a separate experiment. Each involved searching a small room and a large room, taking a civilian they discovered into custody, possibly throwing a fake grenade, and possibly discovering suspicious objects. The 14 exercises were in pairs of correct execution followed by incorrect execution. Incorrect methods entailed violating of constraints listed above. One pair of exercises was the basic procedure; one involved a grenade; one involved with noting booby traps; one involved moving counterclockwise instead of clockwise; one involved more detailed actions with booby traps; one involved moving unusually slowly; and one involved repeated actions when the first action was done incorrectly. Detailed scripts for each exercise were provided to subjects who simulated Marines
We are only interested in sensor values that signal human activity. For most of our sensors, these are abnormal values. For the infrared and force sensors, this was two standard deviations above the mean value; for the light and sonar sensors, this was two standard deviations below; and for the motion sensors, this was two standard deviations either above or below. Means and standard deviations were calculated separately for each sensor because they did fluctuate with position; the sonar, in particular, gave a mean value in the absence of people that corresponded to the distance of the nearest large surface. This thresholding of signals gave in our experiments a reduction of the data to 14% of its initial volume for the non-microphone sensors, resulting in an average of about 6 points per second per sensor. The audio of 20,000 samples per second was averaged to get a compressed signal of 200 hertz, and peaks of width roughly 0.1 seconds were sought (to aid in finding footsteps while not ignoring other loud sounds). To further reduce spurious audio peaks, we excluded peaks whose heights were less than two standard deviations above the mean peak height, giving an average of one audio peak per second.
Figure 1 shows an example plot of such "exceptional events" during exercises 11 through 14 for a set of sensors running off a single computer. Time is measured in seconds. Exercises 11 (time 150 to 300) and 13 (time 600 to 750) were done correctly, and exercises 12 (time 400 to 550) and 14 (time 750 to 900) were done deliberately incorrectly. Row 2 is the microphone sound-energy peaks, row 9 is the estimated periods of human speech, row 12 is the narrow-range infrared sensor, row 14 is the motion sensor, row 15 is the light sensor, row 17 is the sonar sensor, and row 18 is the force sensor. The poorly performed exercises clearly have a different pattern. Analogous plots are being used in applications such as healthcare monitoring, e.g. (Wang and Skubic, 2008) where changes in activity patterns of patients are clearly visible in comparisons even if they are hard to see in examining a single plot.
Assessment of the Marines can be done for each constraint except c6. Since we try to emplace sensors to cover the training area well, constraint c3 is assessed by checking whether a sufficiently significant record of activity is found at each set of sensors. For this we used the rate of occurrence of any sensor reading over its threshold at a particular location, and compared against the rate for correct performance. Rates that are significantly different trigger a flag on constraint c3. We can see an indication of loitering in the density of the indicators at time periods 400-450 and 820-860 for sensors 14 and 15.
Constraint c4 is assessed by noting whether the force sensor indicates that a suspicious object was moved or picked up (bad), and whether other sensor readings indicated that the Marine was inspecting the object (good). Trainees picked up suspicious objects at times 415 and 820 in Figure 1.
Figure 1: Example plot of exceptional events at one sensor for indoor training.
Constraint c5 can be addressed by noting speech that occurred during the exercise. We estimated this from noting when the amount of sound energy in the 100-400 hertz frequency range was at least 25% of the total sound energy in the range of 1-2000 hertz. Figure 1 plots times exceeding this threshold on the "sensor 9" line. The experimenters were talking for time periods 0-100 and 900-1000.
Constraint c2 can be measured by the amount of time the Marines take to traverse a door or entryway, which we measure with sonar sensors across the door. We compare this time to that of normal traversal of a door, and significantly shorter times trigger a flag on constraint c2. Figure 1 does not show this, but saw this for microphones closer to the door.
Constraint c1 concerns weapons handling. Chiefly we are interested in the "coverage" of the area by weapons (or how wide a range of locations are pointed at by the weapons in the horizontal plane). A way to approximate this for two Marine weapons is the average difference in the yaw angle over the exercise weighted by the degree to which the weapons are horizontal:
where is the yaw and is the pitch angle in radians (where zero is horizontal). This ranges from 0 to 1 and represents the fraction of azimuth range that is adequately covered by the two weapons, discounted by the angle of inclination of the weapons (so a weapon pointing at the ground does not contribute any coverage). Values of D that are 50% or less of those of normal runs trigger a flag on constraint c1. Figure 2 plots this D metric for the two weapons in the experiments in Figure 1. It can be seen that coverage was poor for the second (400 to 550) and fourth (750 to 900) experiments, the "bad" runs, excellent for the third experiment (600 to 750), and marginal for the first experiment (150 to 300).
Figure 2: Coverage metric values in the experiments of Figure 2.
A disadvantage of our noninterventionist approach to tracking is that we could not localize trainees precisely, only note when they were nearby to sensors. We tried double integration of the accelerometers to estimate positions, but this was too error-prone; and location by satellites using GPS requires a significant antenna which would have added weight to the backpacks. This means we could not monitor constraint c6 (accidental pointing of weapons at fellow Marines) as we could on the other system to be described. Nonetheless, we judged that we did well on automated assessment of the other constraints.
We also conducted research on automated assessment of outdoor Marine training as part of the BASE-IT project (Rowe et al, 2010). We did experiments with real Marines who followed a set of scenarios. Multi-camera fusion, GPS (Global Position System) devices, and motion-smoothing filters were applied to obtain relatively smooth estimates of paths followed by the Marines, using technology from Sarnoff Laboratories in Princeton, New Jersey, US (Cheng et al, 2009); all three were necessary to obtain sufficient accuracy. (Similar tracking technology for training is in the Ubisense system of General Dynamics, www.ubisense.net.) Orientation sensors on weapons and helmets were used in response to difficulties in obtaining adequate orientation estimates from video, though our goal is to eventually rely exclusively on the latter.
BASE-IT automatically evaluates the performance of a group of 4-20 Marines (a fire team or a squad) in a realistic setting. It goes well beyond previous work in virtual environments such as (Lampton et al, 2005) since it handles real-world data. Again we used (U.S. Marines Corps, 1998) as our reference. The primary concerns of instructors relate to safety, including how well the Marines are looking in all directions ("360-security"), degree of exposure to potential snipers ("danger"), and how far apart the group of Marines are ("dispersion"). Other concerns are collinearity (they should not form a single line), coverage with their weapons (they should cover the major threats), speed (they should not be too fast or too slow), mobility (they should be able to take cover), centrality of the leader, and interactions with "role players" simulating local residents (more interactions during training are better). Again, assessment is based on positions and orientations of the Marines. Also used is a preanalysis of a graphics model of the terrain to find potential sniper positions, including windows, doors, and corners of buildings (depending on the angle of view); larger areas of unobstructed terrain are also assigned a small threat level.
Data was obtained from the tracking database using C++ code, and visualized with Python routines. Figure 3 shows an example of a direct parameter visualization, for the count of the number of clusters formed by seven Marines (with a threshold of 10 meters) during an exercise. Low values represent times when Marines came unnecessarily close to one another, which is undesirable.
Figure 3: Example plot of number of clusters of Marines during an exercise.
We also plot exceptional values of important parameters, occurrences of which we call "issues" rather than mistakes since they may have legitimate excuses. Issues for individual Marines include a Marine being too close to another, a Marine failing to cover a nearby window or door with their weapon, and a Marine accidentally pointing a weapon at another Marine. Issues for the group of Marines include forming too few clusters, moving too fast, failing to cover all directions adequately, and returning to a location visited previously. Figure 4 shows a time plot of individual issues for one training experiment where colors indicate Marines. Personnel labeled "0" were two roleplayers, and three Marines did not have any issues during this exercise.
Spatial plots provide additional perspectives on the data. Figure 5 shows locations of occurrences of Marines coming too close to one another (issue type 1). Black rectangles indicate tents, and colored dots represent Marines with the same encoding as in Figure 4. Although this was flagged, Marines do have legitimate excuses to be near the tents because the tents were small and they need to take cover behind them. Figure 6 plots all the issues (the larger circles) for one Marine during one exercise; small green circles represent normal behavior. The Marine had problems with pointing his weapon, mostly in aiming too closely to other Marines, and some in regard to covering the window of the northernmost building.
Figure 4: Example plot of issues for individuals over time during an exercise.
Figure 5: Example plot of "flagging" (pointing weapons too much towards one another).
Comparison plots for different runs were useful for this task because the Marines were being rehearsed. Figure 7 shows an example comparison plot of the centroids of the Marines in exercises 490 and 496. Clearly the Marines slowed down and responded to suspicious activity in the latter part of 496 that did not occur in 490. Figure 8 shows a different kind of comparison, of the number of issues observed in three different runs. Here redness (on a continuum of blue to purple to red) indicates the degree of seriousness of the rate for each kind of issue.
Figure 6: Example plot of all issues for Marine 3911.
Figure 7: Comparison of distance traveled and locations in exercises 490 and 496.
Figure 8: Comparison of group issue counts for three exercises.
Danger and its coverage by the trainees are important but complicated, so we have special visualizations for them. Marines are expected to pay attention to possible sniper positions at visible windows, doors, corners, and centers of large areas, and weight them by distance. We precompute threat levels at evenly spaced sample points in the terrain for each possible sniper position. Then for a particular Marine location, we look up the threats for the nearest sample point and their seriousnesses, and check how well the Marine is viewing them and covering them with their weapon, based on their body orientation and weapon orientation yaws. We total these values for a group of Marines to measure how well they are doing. A coupled pair of diagrams is helpful. Figure 9 plots threat awareness over space at a particular time, and Figure 10 plots the product of the danger and unawareness (one minus awareness). In both diagrams, the degree of redness on a scale red-purple-blue indicates the degree of danger. Note that Figure 9 shows the Marines are not covering the east side of the terrain very well, but Figure 10 shows that those regions are not critical to their safety. But they could better cover the north end of the left road (the purple-colored circles stacked vertically) as well as the south side of the terrain as indicated by the purple dots.
Figure 9: Example plot of threat awareness at one instant of time.
Figure 10: Danger weighted by awareness for the situation in Figure 6.
Our approach can be used to enhance a wide range of training experiences involving physical motion. Much of our hardware is inexpensive and could be reused many times for different kinds of training. Large numbers of sensors are not required when the action of students is confined to a relatively small area of 10-100 feet square. If cameras are used, there need not be many, and cameras may be unnecessary for tracking in many cases. We have shown that a set of visualizations are useful on the sensor data and straightforward to provide.
This work was sponsored by the U.S. Navy Modeling and Simulation Office, the Office of Naval Research as part of the BASE-IT project, and the National Science Foundation under grant 0729696. Opinions expressed are those of the authors and not necessarily those of the U.S. Government.
Chen, Y.-J., Hung, Y.-C., 2009. Using real-time acceleration data for exercise movement training with a decision tree approach. Proc. 8th Intl. Conf. on Machine Learning and Cybernetics, Baoding, CN, July 2009.
Cheng, H., Kumar, R., Basu, C., Han, F., Khan, S., Sawhney, H., Broaddus, C., Meng, C., Sufi, A., Germano, T., Kolsch, M., and Wachs, J., 2009. An instrumentation and computational framework of automated behavior analysis and performance evaluation for infantry training. Proc. I/ITSEC, Orlando, Florida, December.
Coker, C., 2004. Motor learning and control for practitioners. McGraw-Hill, New York.
Hay, P., 2006. Assessment for learning in physical education. In Kirk. D., MacDonald, D., O'Sullivan, M., The handbook of physical education, Sage, London.
Hixson, J., 1995. Battle command AAR methodology: a paradigm for effective training. Proc. 27th Winter Simulation Conf., Arlington, Virginia, USA, pp. 1274-1279.
Hone, G., Swift, D., Whitworth, I., and Farmilo, A., 2008. The case for coarse-grained after action review in computer aided exercises. Intl. Command and Control Research and Technology Symposium, Bellevue, WA, June.
Lampton, D., Cohn, J., Endsley, M., Freeman, J., Gately, M., and Martin, G., 2005. Measuring situation awareness for dismounted infantry squads. Interservice/Industry Training, Simulation, and Education Conference.
Minnen, D., Westeyn, T., Ashbrook, D., Presti, P., Starner, T., 2007. Recognizing soldier activities in the field. Proc. Conf. on Body Sensor Networks, Aachen, Germany, March.
Morreale, P, Qi, F., Croft, P., Suleski, R., Sinnicke, B., and Kendall, F. 2010. Real-time environmental monitoring and notification for public safety. IEEE Multimedia, 17 (2), pp. 4-11, April-June 2010.
Rowe, N., Houde, J., Kolsch, M., Darken, C., Heine, E., Sadagic, A., Basu, C., and Han, F., Automated assessment of physical-motion tasks for military integrative training. Proc. Second International Conference on Computer Supported Education, Valencia, Spain, April 2010.
Rowe, N., Reed, A. Schwamm, R., Cho, J., and Das, A., 2011. Networks of simple sensors for detecting emplacement of improvised explosive devices. Chapter 16 in Critical Infrastructure Protection, ed. F. Flammini, WIT Press.
Smith-Autard, J., 2004. Dance composition, 5th ed. A&C Black, London.
U.S. Marine Corps, 1998. Military Operations on Urbanized Terrain, Publication MCWP 3-35.3.
Wang, S., and Skubic, M., 2008. Density map visualization from motion sensors for monitoring activity level. Proc. IET 4th International Conference on Intelligent Environments, Seattle, WA, US, pp. 1-8, June 2008.