Benchmarking computational fluid dynamics models of lava flow simulation for hazard assessment, forecasting, and risk management

Numerical simulations of lava flow emplacement are valuable for assessing lava flow hazards, forecasting active flows, designing flow mitigation measures, interpreting past eruptions, and understanding the controls on lava flow behavior. Existing lava flow models vary in simplifying assumptions, physics, dimensionality, and the degree to which they have been validated against analytical solutions, experiments, and natural observations. In order to assess existing models and guide the development of new codes, we conduct a benchmarking study of computational fluid dynamics (CFD) models for lava flow emplacement, including VolcFlow, OpenFOAM, FLOW-3D, COMSOL, and MOLASSES. We model viscous, cooling, and solidifying flows over horizontal planes, sloping surfaces, and into topographic obstacles. We compare model results to physical observations made during well-controlled analogue and molten basalt experiments, and to analytical theory when available. Overall, the models accurately simulate viscous flow with some variability in flow thickness where flows intersect obstacles. OpenFOAM, COMSOL, and FLOW-3D can each reproduce experimental measurements of cooling viscous flows, and OpenFOAM and FLOW-3D simulations with temperature-dependent rheology match results from molten basalt experiments. We assess the goodness-of-fit of the simulation results and the computational cost. Our results guide the selection of numerical simulation codes for different applications, including inferring emplacement conditions of past lava flows, modeling the temporal evolution of ongoing flows during eruption, and probabilistic assessment of lava flow hazard prior to eruption. Finally, we outline potential experiments and desired key observational data from future flows that would extend existing benchmarking data sets.


Introduction
Lava flows form slow-moving but destructive hazards at volcanoes around the world. To anticipate and mitigate the consequences of these effusive eruptions, scientists use numerical lava flow models for hazard and risk assessment, real-time flow forecasting, hazard communication, and evaluating mitigation measures. A variety of lava flow models have been developed for these applications, and their accuracy and speed of computation is crucial to researchers studying lava flow behavior, as well as to volcano observatories and emergency managers responding to a crisis. For this reason, it is essential to assess the accuracy and efficiency of tools available for constructing hazard maps, estimating flow advance rates during eruptions, and designing diversion barriers. We also seek to develop new codes that increase our lava flow simulation capabilities.
The physical and thermal complexity of lava flows presents a challenge for accurately forecasting their advance or reconstructing past emplacement. Current lava flow models range in simplifying assumptions, physical complexity, and dimensionality. They also vary in the extent to which they have been tested for accuracy. Physically accurate models would allow study of the fundamental controls on lava flow behavior that have previously been identified through empirical observations or simplified experiments (e.g., Walker, 1973;Kerr et al., 2006;Castruccio et al., 2010). Accurate and fast numerical models are also crucial for hazard assessment and flow forecasting, as well as designing and testing flow mitigation strategies such as diversion barriers (Fujita et al., 2009;Scifoni et al., 2010). To evaluate existing models and new codes in development, we conduct a benchmarking study of computational fluid dynamics (CFD) models for lava flow emplacement.
Lava flows are fundamentally gravity currents, where the gravitationally driven flow down slope is resisted by the flow viscosity, cooling and crust formation, and interaction with topography. Effusion rate is a crucial control on final flow length and advance rate, with higher effusion rates producing flows that travel farther and faster (Walker, 1973;Kauahikaua et al., 2003). Near the vent, lava flows have been considered isoviscous Newtonian fluids (e.g., Takagi and Huppert, 2010). Cooling, however, leads to the growth of a surface crust (Griffiths and Fink, 1993) and to lava crystallization, which introduces non-Newtonian rheology (Hulme, 1974;Lyman et al., 2005;Soule and Cashman, 2005;Castruccio et al. 2013). Topography also exerts a major control on flow behavior, as the underlying slope drives advance rates, while topographic features may split and slow flows or confine and lengthen them (Dietterich and Cashman, 2014;Dietterich et al., 2015).
Numerical modeling is essential to accommodate these internal and external factors for simulating flow emplacement. Existing lava flow models can be deterministic, producing one solution from a given set of inputs, or probabilistic, producing a distribution of solutions from a distribution of inputs. We focus on deterministic codes, but all can be applied probabilistically with Monte Carlo methods. Existing deterministic models range from onedimensional (1D; e.g., FLOWGO, Harris and Rowland, 2001) to 3D (e.g., LavaSIM, Hidaka et al., 2005; GPUSPH, Bilotta et al., 2015) and incorporate a range of physical complexity, including thermal and rheological evolution. Most models for lava flow emplacement are 2D (SCIARA, Crisci et al., 2004;MAGFLOW, Vicari et al., 2007;LavaPL, Connor et al., 2012;VOLCFLOW, Kelfoun and Vargas, 2015;COMSOL, Chen and Rempel, 2015) applying either cellular automata or a shallow-water approximation (depth-averaging) to simulate unconfined flow in plan view, but without vertical variability in parameters. Models that are isothermal or isoviscous (e.g., MOLASSES, Volc-Flow) are limited to simulating Newtonian or Bingham flows, and no plan-view description can fully simulate cooling or crust formation. Finally, many lava flow models have never been validated for scenarios with known solutions from analytical theory, experiments, or observed eruptions. We therefore seek to expand the development of fully 3D physical models to improve our numerical tools, and test existing codes for accuracy and speed through model benchmarking.
Model benchmarking is an exercise to compare the performance of many models when simulating a specific case with a known solution. This framework has been applied to many geologic cases, including mantle convection and climate models (e.g., Blankenbach et al., 1989;Schmeling, et al., 2008;Charbonnier and Gertisser, 2009;Harrison et al. 2014;Costa et al., 2016), but only recently introduced for lava flow models (Cordonnier et al., 2015). Cordonnier et al. (2015) define a set of benchmarks based on analytical theory, experiments, and wellobserved natural lava flows that we use for our study. We also extend their study by (1) including additional experimental data for benchmarking, (2) testing multiple codes for multiple benchmarks, (3) evaluating both model accuracy and CPU efficiency, and (4) interpreting experimentally based benchmarking results in the context of natural lava flows. By running all benchmarks for all of the modeling tools, we can directly compare their accuracy and efficiency. This allows us to identify model strengths and weaknesses and the most and least important parameters, controls, and physical or thermal properties that must be included in lava simulations for a variety of applications. Our results inform code selection for different purposes; our assessment of model uncertainty and efficiency has implications for choosing codes that are appropriate for applications ranging from hazard map construction and flow forecasting, to studies of fundamental lava flow behavior and impact on the built environment.

Experimental data
Experiments offer a way of studying parameters that influence lava flow behavior in a controlled environment, and they therefore provide ideal data for benchmarks, with well-defined inputs and results. There has been extensive experimental work on viscous and cooling flows that capture lava flow behavior. The advance behavior of isothermal Newtonian flows is well quantified by analytical theory and experiments for conditions of either axisymmetrical spreading or travel down a slope (Huppert, 1982;Lister, 1992). Both theory and experiments have also been used to examine non-Newtonian viscous flows that simulate the behavior of flows with crystals (Hulme, 1974;Osmond and Griffiths, 2001;Balmforth et al., 2006;Lyman, et al., 2005;Castruccio et al., 2010). Introducing cooling in theory and analogue fluids has captured the thermal signature of flows and the impacts of temperature-dependent rheology and a surface crust on flow emplacement (Griffiths and Fink, 1993;Griffiths et al., 2003;Kerr et al., 2006;Garel et al., 2012). Most recently, the use of molten basalt in experiments has extended this work, although still at the laboratory scale (Lev et al., 2012;Edwards et al., 2013;Dietterich et al., 2015).
From the perspective of lava flow hazards, we are particularly interested in simulating the interaction of flows with obstacles, including both natural topography and human-made infrastructure and flow diversion structures. Lava flows are ultimately controlled by topography, thus accurately incorporating ways in which flows interact with that topography is critical for modeling lava flow advance (Dietterich and Cashman, 2014). Additionally, lava flows are one of the few volcanic hazards that may be mitigated through engineering (e.g., Barberi and Carapezza, 2004). For this reason, it is important to develop model codes that can be used to test the effectiveness of different types of lava diversion geometries (Fujita et al., 2009). We show here how recent experimental work on flows into obstacles (Dietterich et al., 2015) can be used to test the accuracy of model results for these scenarios.

Benchmark definitions
We use the benchmarks defined by Cordonnier et al. (2015) as a guide to define our set of benchmarks. These cover a range of physics and complexity, from simple scenarios with analytical solutions, to the complexity of cooling molten basalt. Benchmark parameters and observations are summarized in Table 1.

Benchmark A (BM-A): Isothermal, isoviscous sloping flow
Our first benchmark is an isothermal, isoviscous flow of a Newtonian fluid erupting from a point source onto a sloping plane. This is based on a benchmark previously defined for lava flow models by Cordonnier et al. (2015;BM2). It is a scenario that was explored analytically and experimentally by Lister (1992), and is well understood, capturing the dominant behavior of a lava flow as a viscous gravity current. We compare our numerical simulations to the analytical theory (Lister, 1992), as well as results from experiments performed with golden syrup at the University of Bristol (Dietterich et al., 2015). Using identical experimental geometry, parameters, and fluid properties, we can directly compare the propagation of the numerical and experimental gravity currents.
Benchmark B (BM-B): Isothermal, isoviscous sloping flow into obstacles Our next two benchmarks use the same setup as the first, with an isothermal, Newtonian isoviscous, sloping flow, but place an obstacle in its path. These benchmarks are derived directly from experiments investigating the response of viscous flows to topographic barriers and the scale of geometry that may divert flows (Dietterich et al., 2015). The obstacles in the experiments were V-shaped triangles with 4-cm-long sides and internal angles ranging from a narrow 30°to 180°(a wall orthogonal to the flow direction; Fig. 1). We select the 90°and 180°obstacle experiments as benchmarks to test the numerical simulation of flow thickening upslope of an obstacle, which was observed in the experiments but is not generally accounted for in designing flow diversion structures (Dietterich et al., 2015). Importantly, positioning of the obstacle at the flow centerline forces the flow to split around the obstacle, which both widens the flow and slows its advance.

Benchmark C (BM-C): Cooling, isoviscous axisymmetric flow
The fourth benchmark adds complexity by introducing temperature to the viscous flow. In this benchmark, previously defined as BM3 by Cordonnier et al. (2015) and based on experiments by Garel et al. (2012), a hot Newtonian fluid is extruded from a point source onto a horizontal plane and allowed to cool to the ambient air temperature. The propagation of isoviscous axisymmetric flow has an analytical solution (Huppert, 1982), and data capturing the radial propagation and flow surface temperatures are provided by experiments with hot silicone oil (Garel et al., 2012). Surface temperature is presented as normalized temperature, ranging between the ambient temperature (0) and the eruption temperature (1).

Benchmark D (BM-D): Cooling, solidifying, sloping flow
The final benchmark is the most complex and closely approximates the behavior of many lava flows, with the advantage of using an experimental setup where the input parameters and temporal evolution of the 3-D flow morphology and temperature are well documented. This benchmark is based on an experiment performed at the Syracuse University Lava Project that involved pouring molten basalt at a constant supply rate onto a sloping plane (Dietterich et al., 2015). Similar to BM4 in Cordonnier et al. (2015), this benchmark captures both thermal effects and their impacts on flow rheology. This experiment also allows us to compare the flow propagation and surface temperature between the numerical simulations and the experimental data.

Participating codes
We use these benchmarks to compare the performance of five codes: VolcFlow, OpenFOAM, FLOW-3D, COMSOL, and MOLASSES. Each code and its implementation is briefly described below. The CFD models solve for the conservation of mass and momentum, while OpenFOAM, FLOW-3D, and COMSOL also solve for conservation of energy. MOLASSES uses a cellular automata approach that guarantees conservation of mass. These models are all deterministic, producing one output for a given set of inputs. We run all models with the same spatial grid resolution and extent for comparison purposes. All input parameters and model setup details used in this paper are provided in Additional file 1.

VolcFlow
The VolcFlow model is a 2D Eulerian model written in MATLAB that simulates isothermal flow of Newtonian or Bingham fluids over a digital elevation model (DEM; Kelfoun and Druitt, 2005). VolcFlow solves for mass and momentum conservation using the depth-averaged approximation, with isothermal viscosity and yield strength as rheological parameters. The code was originally developed for simulation of debris avalanches, but has been applied to lava flows as well (Kelfoun and Vargas, 2015). The model is implemented within the MATLAB user interface and is available at http://lmv.univ-bpclermont.fr/volcflow/.

OpenFOAM
OpenFOAM (Open Field Operation And Manipulation; http://www.openfoam.org; Jasak et al., 2007) is a numerical modeling toolbox that is based on the finite-volume method. OpenFOAM is a C++ open-source software package produced by OpenCFD Ltd. It has existing solvers to handle complex fluids, chemical reactions, turbulence, heat transfer, solid mechanics, and electromagnetics. Users may also add new equations, solvers and applications. The code is fully parallelized using OpenMPI, and has interfaces with external meshing that can incorporate a DEM, as well as pre-and post-processing tools. For this study, we have used OpenFOAM to develop a fully 3D lava flow model. We incorporate viscous flow with cooling and changing rheology by modifying the standard multiphase flow solver in OpenFOAM to additionally solve for the temperature field, thermal interaction with the substrate, and a temperature-dependent viscosity. Non-Newtonian temperature-dependent rheology and integration of solidification through a phase change (formation of a crust with its own physical and thermal properties) are possible within OpenFOAM (e.g., Vakhrushev et al., 2014), but not yet implemented.

FLOW-3D
FLOW-3D is a computational fluid dynamics software package based on Finite Volume and Volume-of-Fluid algorithms, combined with interface tracking tools. FLOW-3D, produced and distributed by Flow Science Inc., is a commercial code aimed largely at engineers. All types of heat transfer can be simulated by FLOW-3D, as well as porous, two-phase and viscous flows. FLOW-3D can model a range of rheologies, including those that depend on temperature and/or strain rate. It is also possible to bring in a DEM to simulate flow over real topography. The two main downsides for FLOW-3D are its slow speed and high price. In exchange, the users get

COMSOL
COMSOL® is a commercial multiphysics finite element modeling software which excels in solving problems with several partial differential equations (PDE) representing different physical fields. In this study, we use the PDE solving module of the software to simulate the idealized viscous flow driven by gravity using the depth-averaged approach following Lister (1992). The advantage of this approach is that it simplifies a 3D problem to 2D, which drastically reduces the computational requirement. The main disadvantages with this approach are that the temperature and viscous variation in the vertical direction is ignored and that the model lacks the ability to simulate the behavior of lava overtopping obstacles. The temperature field can only be implemented via heat loss due to convection and radiation at the surface, and the averaged viscosity is derived from the temperaturedependent rheology model. However, if the temperature variation is limited to a very thin boundary layer compared with the lava flow thickness (e.g., Takagi and Huppert, 2010), the temperature variation between the hot central layer of the lava flow and the surface can be neglected, and the averaged flow flux is very close to the real lava flux. Natural lava flows usually have very large Peclet numbers (Lev et al., 2012), which corresponds to thin thermal boundary and very thick molten core where the temperature barely varies. In this case, the depth-averaged flow flux only deviates slightly from reality.

MOLASSES (MOdular LAva Simulation Software in Earth
Science) is a Cellular Automata lava flow simulator that is developed using a modular framework (Richardson, 2016), based on the LavaPL algorithm introduced by Connor et al. (2012). Lava advects from source locations over a DEM to more distant grid cells following universal rules that govern how each location spreads lava, including the minimum thickness needed to spread and how to apportion "spreadable lava" to different neighbors. By modifying the source code of modules, different flow algorithms can be realized (e.g. whether grid cell locations are 4-or 8connected or whether lava is spread proportional to slope), which affect the simulated flow behavior. In this paper, the MOLASSES algorithm incorporates a Moore Neighborhood, where grid cells interact with 8 adjacent neighbors to avoid mesh-based anisotropy (e.g., Vicari et al., 2007). Lava spreading among neighbors is proportional to the relative cell-to-cell slope. Lava can only spread between cells if flow thickness is above a critical thickness set by the user, in essentially a fill-and-spill manner that mimics the behavior of Bingham flows.
MOLASSES only outputs the thickness and extent of a lava flow at the end of the simulation and as such does not provide a temporal evolution of flow inundation. In the following benchmark exercises, the critical thickness is defined using the observed thickness given in experimental results (e.g., for Benchmarks A, B, and D, this is the measured final steady-state thickness in the center of the lava flow 24.5 cm from the vent, while for Benchmark C, the mean thickness of the flow at 1800 s is used).
This specific algorithm has been validated by Richardson (2016) to show that the model is not dependent on slope direction, forms a circle on a horizontal surface, and reproduces a natural lava flow with a fitness >80%. The MOLASSES algorithm has been used to model the 2012-3 Tolbachik lava flows (Kamchatka, Russia) (Kubanek et al., 2015) and the long-term hazard of lava flows on infrastructure within the East Snake River Plain (Idaho, USA) (Gallant, 2016). This code is available at http://github.com/ usfvolcanology/molasses.

Results
We extract measurements of modeled flows to compare to benchmark experimental observations. The use of experiment benchmarks allows comparison of numerous time-dependent flow properties, including down slope (X) and cross slope (Y) propagation, flow thickness (H) measured along the flow centerline at a given location downslope from the vent, and surface temperature (T); these observations are lacking for most natural lava flows. For each benchmark, we report the predictions of the numerical flow models, plotted alongside the experimental observations or predictions from analytical theory. Since MOLASSES does not output flow behavior through time, we report only the final state (final X, Y and H) corresponding to the total volume erupted by the last time step shown in each plot. These values are put onto their own axis on the right side of each plot. All model results are provided in Additional file 1.
We evaluate the overall goodness of fit for each model for each benchmark measurement using the root-meansquare error statistic normalized to the mean experiment value and expressed as a percent (NRMSE; Eq. 1), where n is the number of measurements, t is the time step, ŷ is the model prediction value, y is the observed experiment value, and ȳ is the mean value of the experimental data over all of the time steps that used to scale the result. We assess NRMSE for the growth of the flows in X, Y, H, and R through time, the final thickness of the flows against obstacles, as well as the surface temperature profile across the axisymmetric flow at a constant time of 1230 s in experiments with hot silicone oil (Table 2).
Benchmark A: Isothermal, isoviscous sloping flow The along-slope and cross-slope advance of the numerical simulations from the participating models are shown in Fig. 2a. Experimental measurements are shown in the black dots. To assess flow thickness, we measure the flow thickness along the flow centerline at a location 24.5 cm down-slope of the vent through time (Fig. 2b). This position matches with the location of measurement 0.5 cm upslope of the obstacles in the benchmarks with barriers (BM-B, below). Inspection of Fig. 2 shows that all models capture the general form of the experimental data. All modeled values match experimental values with NRMSEs of <10% for X, while only OpenFOAM and MOLASSES produce this degree of fit for Y. For the growth in flow thickness, H(t), all models except Open-FOAM have NRMSEs of <10% (Table 2). In detail, OpenFOAM most closely approximates both X-and Ypropagation over long times, as well as achieving the best fit the final flow thickness. VolcFlow, in contrast, best captures the early stages of flow evolution.

Benchmark B: Isothermal, isoviscous sloping flow into obstacles
Here we examine flow thickening during interaction with an obstacle, and compare the effects of barriers on local flow thickness in each simulation code. The steadystate thicknesses are plotted against internal angle of the obstacle in Fig. 3. An obstacle internal angle of 0°refers to the control case with no obstacle (BM-A). This comparison shows that OpenFOAM, VolcFlow and COMSOL reproduce the experimental results for the 90°obstacle but underestimate, to varying degrees, the flow accumulation behind the 180°barrier. MOLASSES, in contrast, reproduces the lava thickness behind the 180°barrier but overestimates the thickness behind the 90°barrier. FLOW-3D simulations are thinner in the control and 180°obstacle cases, but thicker against the 90°barrier. Overall, OpenFOAM has the lowest NRMSE (6.62%), followed by MOLASSES (11.60%; Table 2).

Benchmark C: Cooling, isoviscous axisymmetric flow
The cooling, radially spreading benchmark provides a test of combined axisymmetric flow and thermal effects. Axisymmetric dome extrusion has an isoviscous analytical solution (Huppert, 1982), which we plot alongside the experimental measurements (Garel et al., 2012) and results from all models that support temperature calculation in Fig. 4a. All models replicate the observed axisymmetric flow spreading (NRMSE <8%) and capture the cooling of the flow with temperature profiles along radius at 1230 s (NRMSE <0.3%; Fig. 4b). In detail, OpenFOAM and COMSOL bracket the experimental data for spreading, which is offset slightly from the analytical solution of Huppert (1982). MOLASSES matches the experimental radial growth well (NRMSE 0.56%), and for this horizontal, low effusion rate example, MOLASSES volume steps can relate well to time steps with constant effusion rate.

Benchmark D: Cooling, solidifying sloping flow
The final benchmark is based on an experiment in which molten basalt was poured at a steady flux onto a sloping surface of sand. Here we compare the downslope and cross-slope propagation of the flow and the thickness 50 cm downslope of the vent through time (Fig. 5). Solidification is approximated in the OpenFOAM and FLOW-3D models using a temperature-dependent rheology (Lev et al., 2012;Cordonnier et al., 2015). For comparison, we plot the results from isothermal Volc-Flow and COMSOL simulations, which use the initial viscosity (corresponding to the eruption temperature) of the temperature-dependent flows. The molten basalt experiment shows an initial linear advance that contrasts with the modeled versions. This is likely related to the lava supply, which is provided through a channel with a small initial downslope velocity in the experiment, rather than a point source as modeled (Dietterich et al., 2015). OpenFOAM generally overestimates the downslope propagation (NRMSE 6.34%) and lateral spreading behavior of the flow (NRMSE 7.27%). FLOW-3D, along with the isoviscous VolcFlow and COMSOL simulations do a good job in simulating the downslope flow propagation (NRMSE <5%). The isoviscous models then overestimate the flow width, while the FLOW-3D model is narrower (NRMSE 9-13%). MOLASSES produces a simulation that is both significantly shorter and narrower than the experiment (NRMSE 21-39%). In the third dimension, MOLASSES and FLOW-3D capture the flow thickness downslope of the vent (NRMSE 6.15 and 14.29%, respectively). OpenFOAM, VolcFlow, and COMSOL are all thinner than the observed flow (NRMSE >20%). Modeled surface temperatures correspond to the range of surface temperatures observed in the molten basalt experiment (Fig. 6). However, in the molten basalt experiment, the surface temperature is heterogeneous because of surface folding, rupture, and bubbles, which are not simulated in the models. This is apparent in mottled texture of the experimental flow in Fig. 6a and b.

Discussion
The most useful numerical lava flow models are those that are both accurate and efficient, and have the physical complexity necessary for applications ranging from simple flow path prediction to real-time flow forecasting to thermal interactions with the environment. Here we quantify both model validity and computational cost, and then discuss the appropriate applications of each code and the relevance of our benchmark experiments to natural lava flows.

Assessment of model accuracy
The benchmarking process allows direct comparison of the strengths and weaknesses of each code. Overall, all models perform well, but they vary in accuracy and especially computational cost. To test the goodness of fit between model and benchmark, we compare the scaling of the X and Y propagation of flows through time to analytical solutions where they are known (unconfined horizontal and sloping flow). We also measure the offset between the experiment and model extents and thicknesses to calculate the difference in the absolute values.
All the codes perform well in simulating the lateral flow propagation of isoviscous flows, as measured by comparison with the temporal evolution of X(t) and Y(t) predicted by theory (Fig. 7). Although the simulations scale appropriately with time, the flow extents diverge, as reflected in the NRMSE (Table 2). Acceptable values of NRMSE will vary by potential model application, but here we consider an NRMSE <10% to be a good fit.
OpenFOAM best matches the unconfined isoviscous flow advance in X, Y, and R (average NRMSE 3.69%; BM-A and C), but the increase in flow thickness with time is low and unsteady relative to the experiments (Fig. 2). VolcFlow and COMSOL capture the evolution in flow For the cooling benchmarks, FLOW-3D and Open-FOAM provide the best fits to the experimental data. All of the model results for the dome extrusion benchmark (BM-C) exhibit the tightest agreement between each other and the experimental observations (<16% spread between all models in radius and normalized temperature with respect to time). OpenFOAM, FLOW-3D, VolcFlow, and COMSOL also capture the advance of molten basalt (NRMSE <7%; BM-D). In this final benchmark, FLOW-3D performs best overall, largely capturing the downslope and thickness growth (average NRMSE 9.15%). Open-FOAM models flow width the best (NRMSE 7.27%), but the thickness values fall~20% below the observations. VolcFlow and COMSOL capture the flow advance overall, but do not include cooling.
In assessing the performance of the MOLASSES model, we can only compare flow extent at the end of an experiment. The model produces variable results, with good fits to the length, width, and thickness of Benchmark A, the radius of Benchmark C, and the thickness of the molten basalt Benchmark D (NRMSE <10% for all). The modeled downslope and cross-slope dimensions of Benchmark D differ significantly from the experimental observations (NRMSE 21-38%).
The accuracy of model predictions can largely be explained by model assumptions. For example, the 2D CFD codes (VolcFlow and COMSOL) underestimate the thicknesses against obstacles relative to the experiments, but perform similarly to the other benchmarks when no obstacle is present. This is likely because the obstacle represents very steep topography (a vertical step), which can produce errors in the shallow water approximation solution (Kurganov and Petrova, 2007). Unsteady and low thicknesses seen in the 3D model results (OpenFOAM and FLOW-3D), may be related to the limited vertical resolution defined by the mesh. The MOLASSES model mostly matches the observed thicknesses because the input "critical thickness" parameter of the simulation is taken from the experimental thickness. However, the cellular automata approach does not allow for any dynamic effects, which may limit its accuracy where flows encounter obstacles and velocity of the flow can affect its thickness (Soule et al., 2004;Dietterich et al., 2015).
Different approaches to rheology can also explain some of the misfits between models and experiments. Poor MOLASSES results for length and width of the molten basalt flow may be related to the model assumptions of a Bingham-like fluid with a residual thickness that limits spreading (Kubanek et al., 2015). The chosen "critical thickness" parameter also may not be representative of the modal thickness of the entire flow. Modifying the critical thickness parameter would likely have a strong influence on the result. For example, a lower critical thickness in the molten basalt Benchmark D (Fig. 5) might increase the extent of the MOLASSES simulation to match the experimental observations, but it would produce a thinner flow. Importantly, the 3D cooling models (OpenFOAM, FLOW-3D) do not perform better overall for the flow propagation and thickness than the 2D isothermal models (VolcFlow, COMSOL) in the molten basalt Benchmark D. This demonstrates that the flow behavior was not coolinglimited over this short distance and duration, and that a longer molten basalt experiment, an analogue experiment with a growing surface crust, or a well-documented natural flow would be necessary to fully test models with cooling and solidification.

Assessment of computational efficiency
We assess model computation cost by comparing the total CPU hours required for the model runs for each benchmark (Fig. 8). Because the processing speed of all the processors used was relatively similar (2.5-3.6 GHz), CPU hours is a proxy for computational steps or the total floating point operations performed. The results demonstrate the significantly greater computation time for the CFD models relative to the cellular automata code, and the 3D models relative to the 2D ones. Overall, MOLASSES runs 2-3 orders of magnitude faster than VolcFlow and COMSOL, and 4-5 orders of magnitude faster than OpenFOAM and FLOW-3D. However, it does not produce the same detailed output from physical inputs, only the final extent of a flow for a given volume and minimum thickness parameter. Because differences in computation times are in order of magnitude, smaller differences between machine speeds  Fig. 7 Comparison of scaling of X and Y propagation with time for the isoviscous horizontal and sloping benchmarks without obstacles (BM-A and C). For horizontal unconfined flow, theory predicts X = Y~t 1/2 (Huppert, 1982), while for sloping flow, theory predicts X~t 7/9 and Y~t 1/3 (Lister, 1992). The best-fit exponents determined by linear regression in log-log space are plotted and memory resources are not considered. For axisymmetric cases, such as the dome extrusion or modeling of lava lakes, OpenFOAM, FLOW-3D, and COMSOL can all run faster simulations as radial 2D slices.
The models presented here also vary in how difficult or time-consuming they are to set up. FLOW-3D and VolcFlow have user interfaces and do not require extensive coding, while OpenFOAM, being a library and not a pre-built software product, requires a more extensive preparation and setup. COMSOL provides a user interface as well as a scripting portal, so it is easy for initial setup and further customization and batch calculation. MOLASSES has no user interface, but only requires a DEM and a single text configuration file of 6-10 parameters that end-users modify.

Implications for model applications
The applications of lava flow modeling to assessing flow hazard, forecasting, impact, and mitigation require varying degrees of accuracy, speed, and complexity. Despite a range in computational cost, there are limited differences in the performances of each code. The largest variations are found in the benchmark comparisons of flow thicknesses against obstacles and molten basalt experiments. These differences in accuracy have implications for assessing the model complexity needed to simulate lava; when combined with the computation cost, they also inform code selection for different applications.
In this benchmarking exercise, the participating codes all perform well for isoviscous flow, both sloping and on a horizontal plane B,and C;Figs. 2,3 and 4). In the molten basalt case (BM-D), the similarity in results for cooling, temperature dependent viscous flow (Open-FOAM and FLOW-3D) and isoviscous flow (VolcFlow and COMSOL; Fig. 5) suggests that the initial viscosity dominates the flow behavior over the short duration of the experiment, with cooling exerting limited influence.
Importantly, this will not be true in natural basaltic lava flows, which cool and crystallize extensively during flow emplacement (e.g., Cashman et al., 1999;Soule et al., 2004;Riker et al., 2009). The main differences between the isoviscous and cooling models are in the flow width, which decreases, and flow thickness, which increases, with cooling. For Benchmark D, the incorporation of thermal effects in the models plays a less important role than the effusion rate, initial viscosity, and gravity. However, over the timescales and lengths of real lava flows, especially where cooling-limited (e.g., Guest et al., 1987;Harris and Rowland, 2009), the thermal effects are crucial to include, correctly, in simulations.
Our results can be used to choose the best code for a given application. For studying or forecasting short duration and volume-limited flows (e.g., Etna 1981or Kīlauea December 1974Guest et al., 1987;Soule et al., 2004), models with simple rheologies (VolcFlow and MOLAS-SES) are reasonably accurate and significantly faster than more complex models. However, if a flow is longer-lived and cooling-limited (e.g., Etna 1983or Mauna Loa 1984Guest et al., 1987;Lipman and Banks, 1987), fully thermorheological models (OpenFOAM and FLOW-3D) may better forecast its advance. Models that simulate the thermal interaction of the flow with its surroundings (e.g., OpenFOAM, FLOW-3D, and COMSOL) are necessary to assess thermal lava flow impacts on vegetation and the built environment.
Lava flow interventions for hazard mitigation commonly use diversion barriers, although the past success of this approach has been mixed (e.g., Barberi and Carapezza, 2004). Critically, we suggest that numerical modeling could inform future designs. The benchmarks with obstacles (BM-B) demonstrate that obstacle geometry and orientation influence the extent of thickening of the flow upslope of the obstacle that can lead to overtopping (Fig. 3;Dietterich et al., 2015). For simulations of flow into topographic obstacles, including artificial diversion structures (Scifoni et al., 2010;Fujita et al., 2009) or abrupt confining topography (Dietterich and Cashman, 2014), our results suggest that the 2D VolcFlow and COMSOL models consistently underestimate flow thickening at obstacles. It may therefore be more appropriate to use fully 3D models such as OpenFOAM or FLOW-3D to determine if a flow may overtop an obstacle. Even the best performer, OpenFOAM (Table 2), does not capture the observed evolution in flow thickness through time (Fig. 2b). Further experiments and model runs measuring thickening against gradually rising obstacles, such as cones and earth berms, as well as different planform obstacle geometries, could reveal the true extent of this problem for more realistic topographic obstacles. Furthermore, the design of lava flow interventions based on cooling with water (e.g., Williams and Moore, 1983), requires fully 3D Fig. 8 Computational cost as measured in CPU hours for each code and benchmark models that can simulate crust formation (e.g., Open-FOAM and FLOW-3D).
To construct a hazard map, lava flow models are often applied probabilistically. All of the codes that we tested could be combined with Monte Carlo methods to estimate lava flow hazard from either a single vent or an entire volcanic edifice (e.g., Del Negro et al., 2013). This type of application would require running many thousands of simulations to both capture the range of possible behavior and achieve robust statistics (e.g., Tarquini and Favalli, 2015). With this in mind, our computational cost analysis shows that MOLASSES produces very quick and often accurate simulations and would thus be an appropriate choice for probabilistic hazard analysis.
For studying controls on flow emplacement, the slower but more accurate 3D CFD models provide more flexibility, as they can include parameters such as substrate properties and rheological complexity. Simulations to reconstruct the emplacement of past flows (i.e., to determine effusion rates) are similarly not time-sensitive and can take advantage of sophisticated models that are too slow for probabilistic applications and rapid disaster response. Alternatively, for robust flow forecasting and uncertainty analysis, an ensemble modeling approach could be developed that combines models with different strengths and weaknesses following practices from weather and storm forecasting (e.g., Bonadonna et al., 2012).

Applicability of experiment benchmarks to natural flows
We have tested codes for lava flow simulation on benchmarks derived from laboratory experiments. While the scale of these experiments is small compared with natural flows, they share fundamental physics, such as low Reynolds numbers, large Bond numbers, and high Peclet numbers (Garel et al., 2012;Lev et al., 2012;Dietterich et al., 2015), which indicate laminar flow, negligible surface tension, and a very thin thermal boundary. However, natural lava flows contain much more complexity than is captured in these benchmarks, including multiphase rheology (bubbles, melt and crystals), growth and strength of a surface crust, and changing effusion rates (e.g., Cashman and Mangan, 2014). Further validation is necessary to test models that include these features.
As lava flows travel away from the vent, their multiphase properties change dramatically, impacting their rheology and behavior (e.g., Cashman et al., 1999;Riker et al., 2009). Loss of bubbles and growth of crystals combine to alter the bulk rheology of the flow (e.g., Soule and Cashman, 2005;Castruccio et al., 2010) and the heat budget through latent heat of crystallization (e.g., Harris and Rowland, 2001). The surface crust also plays a role in resisting the flow, controlling heat loss, and the transition between pāhoehoe and 'a'ā lavas (Griffiths et al., 2003;Lyman et al., 2005;Cashman et al., 2006). Critically, the growth of the crust, and its strength and thickness, determines whether flows form tubes or open channels, which dramatically affects how far and fast they travel. It is therefore necessary to fully incorporate phase changes for models designed to span this range of behavior. Two-dimensional models like the COMSOL-based depth-averaged model presented here can approximate solidification on the flow margins, but cannot grow a crust, which is fundamental for governing pāhoehoe emplacement (Chen and Rempel, 2015). Three-dimensional models like OpenFOAM and FLOW-3D have the potential to form a crust and simulate an evolution in bulk rheology (e.g., Vakhrushev et al., 2014), but the current implementation of our lava flow OpenFOAM solver includes only a cooling-induced increase in viscosity, and we have not tested FLOW-3D's solidification solver for a lava flow scenario. Employing analogue experiments with surface crust growth (e.g., Griffiths et al., 2003;Garel et al., 2014) as benchmarks could help guide development of appropriate models, although most existing analogue materials are not scaled appropriately to quantify crustal growth effects (e.g., Soule and Cashman, 2005).
Lava flows also change the local topography during emplacement. Construction of cones and depositing of tephra near the vent area, flow levees along channel margins, and elongated tumuli formed by flow inflation will all affect the routing of subsequent flows (e.g., Mattox et al., 1993;Wolfe, 1988;Dietterich et al., 2012;Elissondo et al., 2016). Models should therefore be time-dependent and include syn-eruptive alteration of the topography, or they will lose accuracy as the eruption continues (e.g., Hidaka et al., 2005).
Variations in effusion rate during eruption are also common but are not included in our benchmarking study. Surges in effusion rate often lead to breakouts or overflows that can spawn new flow branches that ultimately widen the flow field (e.g., Guest et al., 1987;Hon et al., 1994;Tarquini and de' Michieli Vitturi, 2014). New flow branches can rob downslope flow fronts of lava supply (Dietterich and Cashman, 2014) and thus exert a fundamental control on flow behavior. Experiments that alter effusion rate while incorporating crust growth and levee formation would be valuable to test model implementations of this behavior (e.g., Rader et al., 2015).
Finally, we note the paucity of detailed documentation of natural lava flows that could serve as the preferred datasets for benchmarks, since they inherently capture the complexities of real lava flows. Although experiments offer the advantage of well-constrained input parameters and complete documentation of the resulting flow, they are limited in both extent and kinetic energy, and thus cannot simulate the dynamical and rheological range of real lava flows. Technological advances (including lidar, InSAR, and structure-from-motion photogrammetry) now permit accurate measurements of pre-eruptive topography, as well as repeat measurements of flow geometry that could provide near-real-time data on effusion rate, flow extent, and thickness during emplacement (e.g., Favalli et al., 2010;Poland, 2014;Slatcher et al., 2015). Ideally these measurements would be supplemented by corresponding measurements of thermorheological properties using a combination of field and remotely sensed measurements of temperature (thermocouples and thermal imaging; Lipman and Banks, 1987;Patrick et al., 2017) and rheology (samples and video analysis; Cashman et al., 1999;Lev et al., 2012;Soldati et al., 2016). We urge the community to come together to collect such data on future lava flows.

Conclusions
Model validation and comparison is a necessary community exercise to test the codes used for studying lava flow behavior, constructing hazard assessments, and forecasting flow behavior during crises (Harris et al., 2016). In our study, we employ experimental data that capture the fundamental physics of lava flows as benchmarks for assessing the accuracy and speed of several different lava flow modeling codes. The exercise provides valuable results for model validation, highlights directions for future model development, and informs code selection for a variety of applications. Through this effort, we have also gained a better understanding of model uncertainty. Continued studies involving more models and even more complex benchmarks are essential as the community seeks to build better hazard maps, develop risk assessments, design future diversion interventions, and improve lava flow forecasts.
Although we have performed our analysis with selected 2D, 3D, and cellular automata codes, there are numerous other lava flow models that could be tested with these benchmarks. Models could be further refined by including benchmarks that employ multiphase materials (bubbles and/or crystals), crustal growth, and changing effusion rates; such experiments would help to capture a greater range of the thermal and physical complexity of lava flows. Future well-documented natural lava flows will also contribute to this process.

Additional file
Additional file 1: Excel spreadsheet containing all benchmark parameters and data, model setup information, and model results. (XLSX 180 kb)