MINER mobile drive test data collection in Austria

Open Mobile Communications Drive Test Data Set for Machine Learning

Guarantees for network metrics, such as latency, data rate, and reliability, play a central role for new wireless communication technologies. To analyse the stochastic behaviour, Salzburg Research created a tool chain for measurement, collection, evaluation, and prediction of controlled mobile communications drive test data, and published the underlying test data set of measurements covering two years‘ worth of highway traffic.

This article is an adapted excerpt from a more detailed paper published in the IEEE Open Journal of the Communications Society.

The capability to provide guarantees for network metrics, such as latency, data rate, and reliability will be an important factor for widespread adoption of next generation mobile networks. Hence, such metrics play a central role in standards for new wireless communication technologies. However, due to the inherently stochastic nature of mobile communications, any guarantees can only be of statistical nature and are highly dependent on the actual physical environment. To analyse the stochastic behaviour, we created a tool chain for measurement, collection, evaluation, and prediction of controlled mobile communications drive test data. We also published the underlying data set of measurements covering two years‘ worth of highway traffic on a 25 km long section comprising 267198 data points. We statistically evaluated the data set and validated it with a corresponding data set from another source. An application of machine learning to the data set illustrates possible use cases:

  • Feed-forward neural networks to predict the data rate in five application scenarios,
  • LIME to explain the behaviour of the model, and
  • an autoencoder to describe the interaction of five signal strength parameters.

The data set and the tool chain show how machine learning can be applied to wireless networks and provide fellow researchers with the means to make further experiments.

Approaches to measuring and collecting quality metrics for mobile communications

Historically, there have been various disjoint approaches to measuring and collecting quality metrics for mobile communications. These span from crowdsourced data collection (private and regulatory), over dedicated (drive test) measurement campaigns to monitoring within a mobile network itself (see our earlier Blogpost).

The network-centered approach, however, requires full access to the communication network and is typically only feasible for the network operator. Network operators constantly monitor the state of their network to provide the targeted quality of service to the users, or to recognize network anomalies or failures. Their supervision technology is capable of monitoring the network usage and all of its parameters in full detail. This provides essential insights for network planning. The availability of such data can be an enabler for certain aspects of research, such as comprehensive user behaviour analyses or traffic type analyses. Unfortunately, due to the delicate nature of such extensive data, network operators typically do not disclose this data or disclose it only to selected researchers under strict confidentiality agreements. This level of secrecy impedes the reproducibility of scientific results and makes it impossible for other researchers to do further work.

For this reason, scientific and commercial approaches to the measuring and collecting of mobile communications quality metrics have evolved that do not rely on access to a network operator’s data. These approaches can be categorised as either crowdsourced measurements or controlled measurements. Commercial services typically favour crowdsourced approaches, since no expensive and personnel-intensive measurement campaigns are needed. Due to scaling effects, it is possible to generate many measurements, albeit with the drawback of a lack of control of measurement accuracy.

Controlled mobile communications measurements require knowledge about the measuring equipment and need careful design of the measuring methodology. Ideally, measurements cover the full geographical and temporal region of interest without gaps. This requires enormous effort and is thus rarely possible for larger regions. Therefore, depending on the focus of a survey these measurements are usually geographically static or moving (e.g., drive tests) and specified as either random or systematic.

Tool chain for measurement, collection, evaluation, and prediction of controlled mobile communications drive test data

Measurement of drive test data

We use a custom-built Raspberry Pi 2 based measurement equipment to conduct our controlled drive test measurement campaign. Each Raspberry Pi is equipped with a Huawei E3372 LTE network interface that provides the Huawei HiLink API for signal parameter monitoring. The measurement equipment constantly receives data at full load. Five TCP flows with 100 MB HTTP traffic each simultaneously transmit and are continuously restarted once a download is complete. Then, the data rate is monitored for each flow and averaged over 1 second. The overall data rate is the sum of the individual flow data rates.

Drive test data set: measurement location
Drive test data set: measurement location

In addition, a GPS module simultaneously records the geographical position and the time. Because the measurements start at the same time as the time sync procedure starts, the precision of the time goes through a transitory period. That is, the time (and measurements which depend on time) depend on the time sync to reach a stable state. The Huawei HiLink API is used to monitor the signal parameters that are provided by the LTE network interface in 1 second intervals. 

For the management of the measurements, we are using our MINER software platform. The following Table describes the measured and monitored data in more detail.

Drive test data set: measured and monitored data
Drive test data set: measured and monitored data

Collection of test drive data

Our drive test measurements are derived from organized measurement drives with the explicit permission from the drivers. We chose a 25 km long Austrian highway section during an observation period of two years from January 2018 to December 2019. This section consists of a mix of urban highway near the city of Salzburg and rural highway in the Salzkammergut region. January 2018 to December 2019 is the most recent long term period in which mobile phone usage behaviour was not influenced by the COVID pandemic.  During the pandemic, mobile communications data shows very atypical network traffic patterns due to significant changes in user movement patterns.

Preprocessing and evaluation of test drive data

We have preprocessed the raw data collected from the measurement scripts to

  1. only include data from the relevant geographic area,
  2. interpolate missing positions,
  3. exclude data not collected on the highway,
  4. removed data where our measurement hardware malfunctioned, and
  5. removed measurements which were from non-LTE networks.

Often the drive tests are conducted during rush hour when high cell loads could be expected. But overall, conceptually there is no strict measurement schedule. There are drive tests at all times of day and during most of the year. The following histograms show the number of collected data points grouped by months and time of day. Due to the absence of strict measurement scheduling, the data points are not uniformly distributed over the year. Especially during the summer months, August and September, there are fewer data points. Nonetheless, for each month there is a minimum of 2000 data points.

Drive test data set: histogram (months)
Drive test data set: histogram (months)
Drive test data set: histogram (hours of day)
Drive test data set: histogram (hours of day)

Identical measurement hardware was used for the entire data collection period. However, during such long periods the network infrastructure is not static. For example, new cells may be installed or others removed by the network operator. Due to this fact – and possible behavioural changes of the network users over the two-year period – some network metrics change significantly as well.

The test drive data set

Our data set is based on travel patterns of a typical commuter. It therefore may neither be representative for the cellular network in question, nor be generalizable to other networks or other car usage patterns, which differ among genders.

The data set contains 267.198 data points coming from two years of driving a car on a 25 km long highway section. We actively measure the maximum achievable data rate in the LTE network of a major Austrian cellular network operator. In addition, we monitor all important general parameters, such as GPS position, time, and signal parameters.

As our earlier work showed that open mobile data sets for machine learning are rare, we made our data set publicly available. For more details as well as the applications of machine learning to the data, see the full open access paper or have look at the data set.

Open access data set: drive test data set

We would be pleased if you use our drive test data set and would be interested in your findings!

Full open access paper: Stefan Farthofer, Matthias Herlich, Christian Mier, Sabrian Pochaba, Julia Lackner, and Peter Dorfinger (2022): An Open Mobile Communications Drive Test Data Set and Its Use for Machine Learning. In: IEEE Open Journal of the Communications Society, vol. 3, pp. 1688-1701, 2022.

If you would like to conduct custom measurements, please don’t hesitate to contact us for more information about our MINER software platform.

The MINER platform was also used for:

Ein grünes Tal in den Bergen mit abstrakter Darstellung von Vernetzung

Broadband Monitoring: Comprehensive Performance Analyis for Communication Networks

Mobile phone providers advertise with “up to bandwidths” of their network – but are these actually achieved? Salzburg Research systematically and continuously recorded the broadband availability in a federal state and provided a neutral, provider-independent database for evaluating mobile broadband availability. Read more

Verwandte Posts:
Matthias Herlich

Matthias Herlich is a researcher in the advanced networking center at Salzburg Research. His technical expertise includes radio access networks, information-centric networking, software-defined networking, peer-to-peer networks, wireless sensor networks, and communication networks for smart grids.

Salzburg Research Forschungsbereich(e):
Salzburg Research Forschungsschwerpunkt(e): , Publiziert am 26. Jan 2023
Erhalten Sie viermal jährlich unseren postalischen Newsletter sowie Einladungen zu Veranstaltungen. Kostenlos abonnieren.

Salzburg Research Forschungsgesellschaft
Jakob Haringer Straße 5/3
5020 Salzburg, Austria