Remotely assessing tephra fall building damage and vulnerability: Kelud Volcano, Indonesia

Tephra from large explosive eruptions can cause damage to buildings over wide geographical areas, creating a variety of issues for post-eruption recovery. This means that evaluating the extent and nature of likely building damage from future eruptions is an important aspect of volcanic risk assessment. However, our ability to make accurate assessments is currently limited by poor characterisation of how buildings perform under varying tephra loads. This study presents a method to remotely assess building damage to increase the quantity of data available for developing new tephra fall building vulnerability models. Given the large number of damaged buildings and the high potential for loss in future eruptions, we use the Kelud 2014 eruption as a case study. A total of 1154 buildings affected by falls 1–10 cm thick were assessed, with 790 showing signs that they sustained damage in the time between pre- and post-eruption satellite image acquisitions. Only 27 of the buildings surveyed appear to have experienced severe roof or building collapse. Damage was more commonly characterised by collapse of roof overhangs and verandas or damage that required roof cladding replacement. To estimate tephra loads received by each building we used Tephra2 inversion and interpolation of hand-contoured isopachs on the same set of deposit measurements. Combining tephra loads from both methods with our damage assessment, we develop the first sets of tephra fall fragility curves that consider damage severities lower than severe roof collapse. Weighted prediction accuracies are calculated for the curves using K-fold cross validation, with scores between 0.68 and 0.75 comparable to those for fragility curves developed for other natural hazards. Remote assessment of tephra fall building damage is highly complementary to traditional field-based surveying and both approaches should ideally be adopted to improve our understanding of tephra fall impacts following future damaging eruptions.


Introduction
With populations surrounding volcanoes growing faster than the average global rate, eruptions will increasingly impact human settlements and livelihoods (Barclay et al., 2019;Freire et al., 2019). Tephra hazards can affect large areas and cause impacts to buildings, ranging from nuisance damage to non-structural features through to potentially lethal collapse (Blong, 1984;Blong et al., b). Two measures that can be taken to reduce the impact of future eruptions are i) pre-event recovery planning, and ii) promoting the construction of buildings that have proven resilient in past eruptions (Spence et al., 2005;Jenkins et al., 2014). Both of these risk mitigation measures typically rely on observations of damage from past eruptions. However, comprehensive documentation of previous tephra impacts to buildings is rare, making accurate damage forecasts challenging Wilson et al., 2017). To date, only three post-eruption surveys that quantitatively assess the relationship between tephra fall hazard intensity and building damage severity have been published. These surveys were carried out after the 1991 eruption of Pinatubo, Philippines (Spence et al., 1996), the 1994 eruption of Rabaul, Papua New Guinea (Blong, 2003) and the 2015 eruption of Calbuco, Chile (Hayes et al., 2019).
We develop a new method to remotely assess building damage so that this sparse, global data set of posteruption tephra impacts can be expanded. Satellite images taken before and after the 2014 eruption of Kelud volcano in Java, Indonesia are compared to identify buildings damaged to different degrees across seven case study villages receiving various amounts of tephra. Then, comparing tephra fall hazard intensity with the damage assessment results, we develop vulnerability models (known as fragility curves) that translate hazard into likely damage, allowing for the impacts of potential future tephra falls to be forecast. These tephra fall fragility curves are the first developed specifically for Indonesian building types with curves fit directly from observations of damage. They are also the first set of tephra fall fragility curves to consider tephra loads causing building damage less than severe roof collapse.
Whilst remote sensing has been used to assess building damage caused by natural hazards, (e.g. Spence et al., 2003;Gamba et al., 2007) it has rarely been applied to posteruption building damage assessment. Post-eruption field studies combined with manual inspection of pre-and posteruption satellite imagery has been used to quantify damage caused by pyroclastic flows from the 2010 Merapi eruption (Jenkins et al., 2013a;Solikhin et al., 2015) and from the Fogo 2014-2015 lava flows (Jenkins et al., 2017). Magill et al. (2013) assessed tephra fall impacts from the 2011 Shinmoedake eruption in Japan using geospatial infrastructure and land cover data combined with semi-structured interviews regarding the impacts. Recently, Biass et al. (in press) used interferometric synthetic-aperture radar (InSAR) to assess building damage from the Kelud 2014 eruption, comparing the intensity of coherence loss between pre-and post-eruption InSAR scenes with the damage observations presented in the current study. This study differs from previous ones by extending remote damage assessment directly into the development of physical vulnerability models.

Case study eruption: Kelud 2014
Geological setting and eruption history Kelud volcano is regarded as one of the most active and deadly volcanoes in Indonesia (Brown et al., 2015;Maeno et al., 2019b). Located in East Java, Kelud is a basaltic-andesite stratovolcano that forms part of the Sunda Arc subduction system (Fig. 1). Kelud has a complex morphology with two large landslide scars from previous sector collapses and multiple peaks made up of large remnant lava domes with the highest at an elevation of 1731 m asl (Wirakusumah, 1991;Jeffery et al., 2013). Kelud has had more than 30 eruptions over the past 1000 years and in the past century has produced four eruptions with a volcanic explosivity index (VEI) of 4 (GVP, 2014). Mass casualties, including the 10,000 fatalities in Kelud's 1586 VEI 5 eruption are attributed to large extensive lahars associated with breakouts from the summit crater lake (Bourdier et al., 1997). A series of drainage tunnels were constructed starting in 1919, dramatically reducing the lake's volume and potential for lahars to catastrophically affect large populations on the flanks of the volcano (Hizbaron et al., 2018). Activity over the past 100 years has been characterised by a cyclic pattern alternating between periods of effusive lava dome growth and subsequent dome destruction during explosive, Plinian eruptions that are typically short-lived and high intensity, and occur with relatively little precursory activity (Hidayati et al., 2019).

The Kelud 2014 eruption
This VEI 4 eruption began at 22:50 local time on 13 February 2014 with the main explosive phase beginning at 23:30 and lasting for about four hours (GVP, 2014). Around 166, 000 people were evacuated from a 10 km radius exclusion zone before the eruption began, following the volcanic alert being raised to its highest level (4), "Awas" at 21:15 on the night of the eruption (Andreastuti et al., 2017). This was a short-duration, high intensity eruption, typical of Kelud, with an estimated eruption magnitude of 4.3 to 4.5 and intensity of 10.8 to 11.0 placing it between Merapi, 2010 andPinatubo, 1991 in terms of eruption intensity (Caudron et al., 2015). A total of seven fatalities were recorded in this eruption, attributed to collapsing walls, ash inhalation and 'shortness of breath' (GVP, 2014). All these reported fatalities were from the Malang regency to the east of Kelud with at least four occurring within 7 km of the vent. The eruption produced pyroclastic density currents running out to 6 km and rain-triggered lahars that damaged buildings up to 35 km away from the vent. Wind shear at an altitude of~5 km asl caused a bi-lobate tephra deposit (Maeno et al., 2019a). Tephra was dominantly dispersed westwards and caused an accumulation of 2 cm of ash on the major city of Yogyakarta more than 200 km from the vent. A secondary lobe produced trace amounts of ash (< 1 mm thickness) on Surabaya,~80 km northeast of Kelud. Clasts 9 cm in diameter were dispersed to over 12 km and most reported building damage was constrained to within 40 km of the vent (Maeno et al., 2019a;Goode et al., 2018;Blake et al., 2015). The International Red Cross reported that > 11,000 buildings were 'completely damaged' with > 15,000 buildings experiencing 'light' to 'moderate' damage in the three regencies surrounding the volcano (IFRC, 2014). Interestingly, despite this widespread damage, posteruption field surveys carried out soon after by Paripurno et al. (2015) and six months later by Blake et al. (2015) found that few buildings experienced severe damage (where 'severe damage' is defined by Spence et al. (1996) as complete failure of any principal roof support structure, such as trusses or columns or deformation/collapse to over half of the internal/external walls). This allowed building repairs to be completed swiftly, with reports of over 99% of damaged houses being repaired within less than a month of the eruption ending (Jakarta Post, 2014). These conflicting assessments of damage, the remarkably swift recovery and the high likelihood of damaging eruptions occurring again in the future (Maeno et al., 2019b), make the Kelud 2014 eruption a useful case study for remote surveying of tephra fall building damage.

Building characteristics and exposure
In November 2019, there were > 400,000 buildings within 30 km of Kelud's vent recorded in Open Street Map (OSM, 2019) with LandScan 2018 estimating that 2.6 million people lived within that same area (Rose et al., 2019). The majority of buildings in villages around Kelud can be described using the three key typologies identified in the field by Blake et al. (2015), which are also similar to those around Merapi volcano (Jenkins et al., 2013b). These typologies are differentiated by their external wall framing (reinforced masonry, brick or timber) but all share a claytiled roof, supported by timber or bamboo framing. Choice of roof design has implications for building vulnerability to tephra hazards and is also a key aspect of traditional Javanese architecture, with designs of increasing steepness reflecting higher social status of the owner (Idham, 2018). To support ventilation, a common design feature of clay-tiled roofs in Kelud is that they are steepest over the centre of the building, usually between 25 and 60°(e.g. Fig. 2a and b), with more shallowly pitched edges and eaves, usually ≤25° (Paripurno et al. 2015). Additionally, to shade buildings more efficiently, eaves often extend to overhang relatively far beyond a building's walls. Changes in roof slope promote tephra shedding from the centre towards the edges of roofs making these areas more prone to collapse, especially considering the timber or bamboo roof supports used for the overhangs of buildings are often thinner and therefore weaker than those in the main part of the roof (Prihatmaji et al., 2014). Roofs made using other covering materials such as corrugated asbestos or metal sheets are relatively uncommon around Kelud, especially for residential buildings. However, this is a popular form of construction for livestock shelters and small shops (e.g. Fig. 2c). As sheet roofs are lighter than tiled roofs and do not require timber battens, the spacing between framing members is typically wider for sheet roofs and therefore, they are potentially more prone to collapse under tephra fall loading (Spence et al., 1996;Blake et al., 2015).

Tephra fall hazard characterisation
Although tephra fall loading, typically measured in kilopascals (kPa), is considered the most appropriate metric to quantify tephra fall hazard towards buildings, loading is rarely measured directly in the field. Instead, loading is estimated using thickness measurements that can later be combined with one or more laboratory measurements of deposit density, if samples are available. Otherwise, density is assumed, typically between the 600-1600 kg m − 3 range for naturally occurring, dry deposit densities (Macedonio and Costa, 2012). Here, we use the tephra field teams within two-three days of the eruption in 81 locations within 2 to 60 km of the vent (Anggorowati and Harijoko, 2015). Thicknesses are converted to loads using a deposit bulk density of 1400 kg m − 3 measured by Maeno et al. (2019a). Maximum pumice diameters were also measured at 32 of the 81 tephra thickness measurement locations and these were used to identify areas where projectile impacts may have contributed to observed building damage. Individual thickness measurements were interpolated into a continuous deposit using two methods: i) inversion modelling using the Tephra2 model, and ii) the interpolation of hand drawn isopachs. Resulting deposits were used to estimate hazard intensity metrics over the entire study area.

Inversion modelling method
Thickness measurements were inverted using the Tephra2 algorithm of Connor and Connor (2006) to estimate the eruption source parameters (ESP; e.g. plume height, tephra mass, total grain-size distribution) and empirical parameters (e.g. fall-time threshold and diffusion coefficient, see Bonadonna et al. (2005) and Biass et al. (2016)) that best reproduce observed measurements. Wind conditions were fixed and inferred from the wind profile for midnight of 13 February 2014 obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) Era-Interim Reanalysis dataset, which is available at six-hourly intervals (Dee et al., 2011). As most tephra was erupted over four hours in one main phase that began half an hour before midnight, interpolation between two time periods (18:00 and 00:00) was deemed unnecessary, so dispersion modelling was carried out using the single wind profile of 00:00 (Fig. 3).
Inversion initially made use of our 81 field measurements combined with 56 measurements from Nakada et al. (2016) and Maeno et al. (2019a), taken over an area up to 200 km from the vent. Inverting multiple sets of measurements simultaneously resulted in an increased discrepancy between measured and modelled tephra thicknesses at the most important sites, close to villages where visible damage occurred. To reduce discrepancies in these key areas, inversion was repeated using only the UGM data as these were taken sooner after the eruption than others and prior to heavy rainfall on 18 February, which may have disturbed the deposits (Dibyosaputro et al., 2015;Blong et al., 2017a). Ranges of ESPs derived from literature on the 2014 eruption (Table 1) were used to inform our initial ESP ranges to which inversion modelling was applied. The optimised set of eruption source parameters that best reproduced tephra deposit measurements during Tephra2 forward modelling are provided in Table 1. In areas where some of the heaviest building damage was reported, the inversion optimised tephra dispersal underestimates the tephra thicknesses measured by UGM field teams by 25-45% (2-2.7 cm). Underestimation in proximal areas (within 10 km of the vent) has likely occurred as inversion optimisation does not account for additional sedimentation of tephra from the plume margins (Bonadonna et al., 2005).

Isopach interpolation method
To provide an alternate dispersal footprint that would not be subject to the same proximal underestimation, isopachs were manually contoured using the same 81 UGM measurements ( Fig. 4) and interpolated using a multiple exponential segments method in TephraFits (Fierstein and Nathenson, 1992;Biass et al., 2019). The thickness intercept of the most proximal segment was used to estimate the theoretical maximum accumulation. This value along with the isopachs were interpolated using cubic splines in Matlab (i.e cubicinterp interpolant of the fit function of the Curve Fitting Toolbox). The resulting surface was exported at a resolution of 500 m. Interpolation of our manually contoured isopachs slightly but consistently overestimated tephra thicknesses measured in the field (Fig. 4D).

Remotely assessing building damage
Where available, post-eruption impact assessments conducted in the field provide critical information, much of Note change in wind direction~5 km asl and high-speed westerly winds~15 km asl. 'Wind direction' refers to the direction wind is blowing towards which cannot be obtained via remote sensing alone (Jenkins et al., 2013b). However, observing impacts in the field is complicated by a variety of logistical factors, deposit preservation issues as well as safety and ethical concerns (Wilson et al., 2012;Jenkins et al., 2014;Hayes et al., 2019). Following the Kelud 2014 eruption, carrying out a comprehensive field-based building damage assessment was complicated by the large numbers of buildings damaged across a widespread area, and by rapid building repairs that began less than a week after eruption onset. Despite this, at least two reports were made soon after the eruption documenting some details of the damage. One report from the International Red Cross, released 21 days after eruption onset, stated that in the three regencies surrounding Kelud, 11,903 buildings were 'completely damaged' by tephra and 11 buildings were destroyed by lahars (IFRC, 2014). Unfortunately, this report did not present their survey data or provide descriptions of what 'completely damaged' specifically refers to. Paripurno et al. (2015) also conducted a study of damage from the 2014 eruption, stating that four buildings were destroyed by lahars and that 8719 buildings were damaged by tephra to varying degrees. This study identified three tephra fall thickness range zones (2.5-5, 5-7.5 and 7.5-10 cm) and used these zones to categorise all buildings within that zone into 'light', 'medium' or 'heavy' damage categories respectively, based on the location of the village that a building was from. This included 6647 buildings from three villages that sustained 'heavy damage' (kerusakan berat). Considering nearly all buildings in Kediri regency (to the west of Kelud) were reportedly repaired within a month (Jakarta Post, 2014), it is unlikely that thousands of buildings experienced severe roof or building collapse. Also, media images and videos from some of the most heavily impacted villages do not feature any buildings that have experienced complete roof collapse (e.g. Berita Satu, 2014;Kreer, 2014). Inconsistencies between these reports, rapid building repairs and the wide area over which buildings were damaged all suggest remote sensing as an ideal method for assessing building damage from this eruption.

Damage assessment method
To remotely assess building damage, we manually assessed changes to 1154 structures in seven distinct areas (Fig. 5) using freely available pre-and post-eruption satellite images from Google Earth. Locations in which to assess building damage were selected based on the availability of high-resolution satellite imagery (30-70 cm pixels), the desire to record damage in areas where the most severe damage was reported to have occurred as well as across a wide range of tephra fall hazard intensities, as this is advised for accurate vulnerability assessment (Rossetto et al., 2014;Wilson et al., 2017). For the pre-eruption imagery, the date closest to eruption with freely available, cloud free imagery spanning the majority of the study area was taken on 26 December 2013, 49 days prior to the eruption. The appearance of each building's roof in these images was compared with its post-eruption appearance using images acquired either five, seven or eight days after the eruption (based on varying availability of cloud-free images soon after the eruption in different areas). These images were then compared with ones taken 98 days after the eruption on 19 May 2014 as this is the first freely available, cloud free acquisition covering the entire study area, taken after the majority of building repairs were completed. To facilitate rapid building repairs and minimise rain damage to building interiors, new clay tiles and blue tarpaulins were widely issued after the eruption. When old dark tiles were replaced with new ones this produced a stark colour change visible in both satellite images and those taken on the ground (e.g. Fig. 2A and Fig. 6). Tiles were occasionally replaced with a light grey roof cover which, based on field surveys, is likely to be asbestos fibre sheeting (Blake et al.,2015). Similarly, roofs that were grey to dark grey Table 1 Ranges of eruption source parameters required for Tephra2 were derived from literature on the 2014 eruption and used to inform ranges for optimisation during inversion. Parameters with 'n.a' (not applicable), are those whose values were determined from inversion optimisation rather than from available literature on the Tephra mass (10 10 kg) 38 -66 b 10-100 68.8 Median particle size (phi) pre-eruption would often have a slight but noticeably lighter appearance after the eruption (e.g. Fig. 2C). Again, we interpret this as replacement of old, damaged or collapsed asbestos fibre sheets based on limited groundtruthing using media images and observations from Blake et al. (2015). These colour changes were used to infer the extent of repairs that were carried out after the eruption, as a proxy for damage. Commonly observed changes in roof appearance were used to develop a damage state scale and a single damage state was assigned to each assessed building (Fig. 7). Damage state descriptions and the observable changes used to assign damage states to buildings are given in Table 2. The four-tiered scale here differs from the six-tiered scales used in previous studies because the available satellite images could not be used to distinguish so many different levels of damage. Especially 'light' DS1 damage, which for example could include damage to water tanks or roof guttering, was not observable in satellite images.

Damage assessment results
The heaviest estimated tephra loads experienced by buildings we assessed were 144 kg m − 2 , equivalent to 10.3 cm of dry tephra from this eruption. In line with these relatively modest tephra loads, only 27 of the 1154 buildings we assessed (2.3%) showed signs that they had experienced severe roof or building collapse, despite deliberately assessing damage in villages reported to have sustained the heaviest damage. Grey roofed buildings, likely made of asbestos fibre roof sheets make up 11% (n = 127) of all those surveyed but a disproportionate 26% (n = 7) of the DS4/5 observations, implying that buildings with such roofs fail at lower loads than those with tile roofs. As the number of buildings observed as DS4/5 is relatively small, the loads leading to collapse may not be representative of the true collapse load for buildings in this region and may explain why the median tephra load for DS4/5 buildings is not higher than those with DS3 (Fig. 8). The majority of buildings we assessed appeared to have tiled roofs (89%, n = 1027). Two surprising damage patterns were displayed by buildings with tiled roofs and it is likely that these are applicable to many such buildings throughout Java. Firstly, of the 790 buildings displaying signs of damage, a large proportion (56%, n = 464) appear to have had their entire roof covering replaced with new clay tiles. Secondly, repairs for many buildings were concentrated along the edges of roofs and, in 172 cases, these were the only parts of the roof which appear to have received any repairs. The edges of these roofs are likely to be verandas or eaves overhanging the building's external walls (Idham, 2018), both of which have been identified as particularly vulnerable in previous eruptions in other countries (e.g Spence et al., 1996;Blong, 2003) and in this eruption (Blake et al., 2015). After these sections of roofs were repaired, they often occupied a larger area than they had prior to the eruption. Measuring the building footprints of 30 buildings randomly sampled from the 172 that had repairs along their edges, we found that 20 of these had increased their footprint size by an average of about 20% compared to pre-eruption. The median tephra fall thickness modelled for roof overhang and veranda damage in this eruption was 6.2 or 8.2 cm thick for the inverted and interpolated tephra hazard layers respectively.

Vulnerability assessment
Assigning damage states to buildings and quantifying hazard intensities that caused the damage constitutes the raw data required to develop fragility curves. This section outlines fragility curve fitting and crossvalidation procedures and includes a description of how curves can be used to estimate or forecast damage. Fragility curves were fit for the two main groups of buildings identified, those with tiled roofs and those with grey roofs. Grey roofs could be made of various different materials including reinforced concrete and sheet metal but based on field surveys and media photos we assume that the majority of these roofs are made of corrugated asbestos fibre sheets with both these and tiled roofs typically supported on timber framing.

Fragility curve fitting and prediction accuracy
For both recognised building types, two sets of fragility curves were fit, one for each hazard characterisation approach (Fig. 9). The curves were fit to data using a cumulative link model (CLM) and take the typical form of a log-normal cumulative density function used in the vast majority of published parametric fragility curves (Rossetto and Ioannou, 2018). A CLM is a type of generalised linear model that makes use of the ordinality of damage states (i.e. light damage < moderate damage < heavy damage). There are several benefits to using CLMs compared to other statistical approaches commonly used in the past (Lallemant et al., 2015;Williams et al., 2019).
One key advantage is that individual damage state curves can be fit simultaneously using observations from the entire data set. This becomes important in the commonly arising situation where there are relatively few observations for a particular damage state (as is the case in this study for DS4/5). A second advantage is that when curves are fit using a CLM, curves for successive damage states cannot cross each other. This undesirable characteristic needs to be avoided if curves are to be used to forecast damage or if the prediction accuracy of the curves is to be assessed. The equation for fitting fragility curves using a CLM and the best fit curve parameters for the 12 new fragility curves from this study are given in the appendix. We calculate prediction accuracies by conducting Kfold cross validation. K-fold cross validation requires partitioning damage data into K randomly sampled, equally sized groups, using one group as the test set (k) and the remaining groups as the training set to fit fragility curves. Damage states are then predicted using the hazard intensities from the test set and accuracy is calculated by comparing the predicted damage states of all buildings to their actual observed damage states. This process is repeated K times using a different group as the test set each time and the average model accuracy across all K validation tests is obtained. K-fold cross Fig. 6 Images of Pandansari Village (village number 4) taken A) within the village by Kiran Kreer and B) from a Maxar Technologies satellite, available on Google Earth. Photos taken 9 and 14 days after the eruption respectively validation typically uses 5 or 10 folds. In this study we used a K value of 5 as splitting data into 10 groups would have a higher probability of producing some groups containing no DS4/5 observations. We iterated this 5-fold cross validation 50 times to produce a stable overall average accuracy score. To predict a discrete damage state using tephra load and fragility curves, a random number between 0 and 1 is generated for each building and compared to the exceedance probability at that building's tephra load. Starting with the highest damage state to the lowest, the predicted damage state for a building is the first one whose probability is higher than the randomly generated number. If the randomly generated number is higher than all three damage state probabilities, DS0/1 is assigned. For example, taking an inversion tephra load of 200 kg m − 2 on a tiled roof, the probabilities of reaching or exceeding DS4/5, DS3 and DS2 are 0.1, 0.75 and 0.9 respectively. Randomly The roof over the central part of the building has collapsed or the building has collapsed The roof framing structure has clearly changed (e.g. the apex of the roof has been rotated or relocated) or building has visibly collapsed/disappeared entirely generated numbers of 0.05, 0.5 or 0.85 would assign such a building as DS4/5, DS3 and DS2, respectively. In this way, with a sufficient number of buildings, the distribution of damage between the different damage states will be appropriately represented at any hazard intensity. Using the approach above, accuracy can be calculated following Eq. 1.
Where K is the number of groups the data is split into for K-fold cross validation, N test set is the number of buildings in the test set whose damage state is being predicted (roughly the total number of buildings divided by K) and n correct predictions is the number of buildings whose predicted damage state matches the observed damage state. This measure of accuracy has the advantage of being simple to calculate and interpret. However, when this measure of accuracy is used on ordinal models, as is the case here, its main shortcoming is that it does not make use of the ordered nature of damage states. For example, if our model misclassifies a building as being DS2 when it was observed to be DS0/1, this error is not as large as if the model had classified it as DS4/5. Unfortunately, this information is lost during simple accuracy calculations. To take the size of discrepancies between observed and predicted damage states into account, we adopt the approach proposed by Rennie and Srebro (2005) and Charvet et al. (2015) to calculate a weighted prediction accuracy. This requires calculating the level of misclassification (i.e. the absolute difference between predicted and observed damage levels) for each data point then dividing this value by the maximum possible difference (N DS − 1).
Where DS i and c DS i are the observed and predicted damage states for the ith observation and N DS refers the total number of different damage states, which in this study is four (DS0/1 to DS4/5). Once the penalised accuracy has been calculated for each building using Eq. 2, the overall penalised accuracy of the model can be calculated during cross validation following Eq. 3.

Penalised model accuracy
Exact and penalised accuracy scores for all four sets of fragility curves are given in Table 3. The various sets of fragility curves predict the exact observed damage state 40-44% of the time. Considering there are four possible damage state categories, a model that predicts damage states at random should make an exactly accurate Fig. 8 Distribution of tephra loads for all surveyed buildings in each damage state for both hazard models. Number of observations for each group indicated above the plot. 89% of buildings have tiled roofs with the other 11% classified as grey roofs, which are assumed to be made of asbestos fibre sheeting. The boxes reflect the central 50% of values and the horizontal black line is the median. The edges of the whiskers extend up to 1.5 times the interquartile range. Any observations beyond these points are considered outliers that are shown as dots prediction around 25% of the time. Comparatively, for the weighted prediction (penalised model) accuracy, a perfectly random predictor would have a score of 0.58 following Eq. 3, while our curves provide scores from 0.68 to 0.75. Figure 10 illustrates how the set of fragility curves with the highest accuracy perform compared to random prediction, with nearly 45% of predictions exactly matching the observed DS, a further 40% being within 1 damage state and approximately 15% being misclassified by > 1 damage state.
At relatively low hazard intensities, the curves fit using hazard intensities derived from inversion model buildings as more vulnerable compared to curves fit using interpolation. Conversely, at relatively high hazard intensities > 150 kg m − 2 , the inversion curves model buildings as less vulnerable and they appear to unrealistically underestimate the likelihood of roof or building collapse (DS4/ 5). When comparing both DS4/5 curves to published fragility curves, the interpolated DS4/5 curves more closely approximate the two roof collapse curves from Jenkins and Spence (2009) that are likely to be most representative of buildings surrounding Kelud. These include tiled and asbestos roofs in 'average to good condition' (Fig. 9).

Lessons learnt from Kelud
Communities surrounding Kelud displayed a great capacity to recover following the 2014 eruption, repairing  . 9 Fragility curves fit using tephra loads from both hazard models for A) tiled roofs and B) grey roofs. Density plots show differing distributions of tephra loads across the 1154 buildings used to fit curves, and that all data are from hazard intensities below 200 kg m − 2 . For comparison with published curves, the black dotted lines in A) and B) are the roof collapse curves for tiled and asbestos sheet roofs from Jenkins and Spence (2009). Their annotations, D tf and A tf , reflect the curve labels from that study. R script, Microsoft Excel spreadsheet and raw data to fit curves are available on Github (https://github.com/flying-rock/kelud14). Curve parameters given in the Appendix thousands of buildings within less than a month of evacuation orders being lifted. Rapid repairs were facilitated by the almost immediate provision of aid in the form of readily available roof cladding materials and military personnel. While it is important for communities to return home and quickly repair their buildings, two aspects of the recovery may have increased building vulnerability, going against one of the key priorities of the 2015-2030 Sendai Framework Disaster Risk Reduction to "Build Back Better" (United Nations, 2015). Firstly, the many buildings in Pandansari village with new tile roofs may now exhibit a marginally reduced vulnerability to tephra loading but an increased vulnerability to energetic impacts from large clasts. The clay tiles that were widely distributed after the eruption were 3 mm thinner than the typical, 15 mm thick tiles in place prior to the eruption. Reducing the thickness of tiles by 3 mm slightly decreases the load they place on a roof, presumably increasing the load of tephra the roof frame can support by an equal amount. However, 15 mm thick clay tiles are already exceptionally vulnerable to shattering under impact from large clasts (with a threshold of 20 joules), and making tiles thinner will have reduced the impact energy and associated minimum clast size required to exceed damage thresholds (Osman et al., 2018;Williams et al., 2019). Secondly, vulnerability is also likely to have been increased in the many cases where repairs have markedly enlarged overhanging and veranda sections of roofs. These sections of roofs have been identified as vulnerable in previous studies but there are multiple reasons why the buildings in this region might be particularly susceptible to roof overhang collapse. Firstly, the flared multi-pitch design of many roofs surrounding Kelud promotes tephra shedding onto the more shallowly pitched roof overhangs. The heavy rain that fell four days after the eruption would likely have further increased tephra shedding onto roof overhangs (Hampton et al., 2015;Jones et al., 2017) and increased deposit bulk density (Macedonio and Costa, 2012). This, combined with accounts of some roofs not collapsing until after the heavy rainfall supports the hypothesis that many overhangs may have been damaged only after receiving additional water saturated tephra mobilised from steep upper sections of the roof catchment. Increasing the size of overhangs allows them to shade and cool the building more effectively while also providing additional living space. However, this is often an area where visitors are given a place to sleep (Idham, 2018), meaning the relatively high vulnerability of these sections of roofs likely has life-safety implications in future eruptions. Specifically, people who have evacuated to seek shelter in villages farther downwind, may find themselves staying beneath a section of roof that is highly prone to collapse.

Implications for damage and vulnerability assessment
We show that free media and satellite images can be used to assess the degree of repair buildings have Fig. 10 Average number of buildings in each level of misclassification (subtracting predicted DS level from observed DS level) for the best performing set of cross-validated fragility curves (tiled roof, inversion hazard layer) compared to a perfectly random prediction. Labels above bars give percentage of buildings within each level of misclassification. This plot represents one test set, i.e. one-fifth (n = 206) of all the tiled roof buildings surveyed received following a tephra fall. By assuming that the degree of visible repairs are a proxy for damage severity, damage can be compared with tephra fall hazard intensity to develop new building vulnerability models. Testing prediction accuracy is important for any vulnerability model and may be even more important when models are developed using unconventional methods such as these. The fragility curves we developed using remote surveys have accuracies comparable to those developed using fieldbased building damage surveys from other hazards. For example, curves developed by Macabuag et al. (2016) using data from the 2011 Tō hoku tsunami and the same cumulative link model curve fitting approach had penalised accuracy rates between 0.71-0.81, marginally higher but comparable to our 0.68-0.75. It is important to note that our fragility curve accuracies cannot be compared to those from any volcanic vulnerability study as no previously published studies have conducted fragility curve accuracy testing. This is likely due to a general perception that insufficient data are available to warrant the use of such tests (e.g. Wilson et al., 2017) and because there are few fragility curves published within volcanology to begin with (compared to other natural hazards). Remote sensing can greatly increase the amount of damage data available for vulnerability assessment, particularly in a post-disaster context where time to conduct field surveys can be highly constrained (e.g. Mas et al., 2020;Williams et al., 2020). Remote sensing enables rapid data collection for large numbers of buildings over relatively wide areas and allows surveys to be carried out years after the damage occurred, if adequate satellite imagery is available. Surveying a wide area remotely can also help to focus field missions, identifying areas that are important to survey in more detail. Damage surveys conducted remotely should be considered highly complementary to field-based surveys, which are capable of providing highly detailed information but usually for a relatively small number of buildings.
Past research on forecasting tephra fall impacts to buildings and all previously developed fragility curves, placed a focus on identifying loads likely to cause severe roof or building collapse, driven by life-safety concerns (Spence et al., 2005;Zuccaro et al., 2008;Jenkins and Spence, 2009). In any given tephra fall however, exponential thinning with distance means that light tephra falls cover a relatively large area and therefore buildings receiving relatively light damage are likely to far outnumber collapses (Blong et al., 2017b), as was the case in the Kelud 2014 eruption. In cases such as this, repair costs associated with non-collapse damage might contribute substantially to the total cost of recovery, so it is important for future studies of tephra fall impacts to buildings to determine under what hazard intensities relatively lightmoderate damage occurs.

Limitations and future work
A major limitation of this study's remote damage survey is that, for most buildings, damage severity has not been directly observed but rather inferred based on the extent of visible repairs. This is problematic because aid for repairs is unlikely to have been evenly distributed amongst all regencies and is unlikely to be distributed in the same way in the future if a much larger number of buildings are damaged. One example of unequal distribution comes from, Tanggung Mulyo 10 km north north-west of Kelud. This village was initially not given free materials by the local government for repairs because the buildings there were deemed to be in poor condition prior to the eruption (Sutriyanto, 2014). Had this village not received building materials from other institutions two months after the eruption, the relative lack of visible roof cladding replacement might give the impression that these buildings were in good condition and highly resilient to DS2 and DS3 damage when in fact the opposite could be true.
If the fragility curves developed from this eruption are to be used in forecasting damage in future eruptions, either at Kelud or at other volcanoes, several issues need to be considered. Firstly, with only 27 DS4/5 roof or building collapses observed, comprising 20 tiled roofs and just 7 grey roofs, fragility curves produced for this damage state in particular should be used with caution. Secondly, considering the fragility curves derived using the interpolated and the inverted tephra loads differ from each other, either both sets will need to be used to give a range for the number of damaged buildings or a decision will need to be made on which set is more appropriate for use in the given study. The accuracy scores for the inversion derived curves were slightly higher than those from interpolation. However, at hazard intensities higher than those observed in this eruption, the inversion DS4/5 curves appear to more strongly underestimate roof collapse probabilities than the interpolation DS4/5 curves, when both are compared to previously published roof collapse curves for similar building types. Lastly, these fragility curves have assumed loads from dry tephra deposits, but some buildings were reported to have only sustained damage after heavy rainfall, which was likely to have increased tephra loads by up to 30% using the saturation assumption method of Macedonio and Costa (2012). Also, as previously noted, the style of roofs in this region may be unique to Java, with flatter roofs perhaps being more prone to severe collapse (DS4/ 5) and relatively less prone to overhang collapse (DS2) triggered by tephra shedding onto the overhangs. Future work could also focus on better constraining collapse loads for these buildings given the larger individual cost to replace collapsed roofs relative to just the overhang, and the associated life-safety concerns of total roof collapse.
In characterising tephra hazard, uncertainties are associated with the initial measurement of deposit thickness (Engwell et al., 2013), as well as the dispersal modelling and isopach drawing process (Scollo et al., 2008;Engwell et al., 2015;Yang and Bursik, 2016). The discrepancies between the two sets of fragility curves are solely driven by the differences between two methods of characterising the tephra dispersal for the same set of field measurements. Hayes et al. (2019) faced a similar issue in determining the tephra thicknesses that had fallen on buildings surrounding Calbuco using two separate sets of isopachs. The authors noted that in the future, uncertainty could be substantially reduced by taking hazard intensity measurements at the site of each damage observation. This measurement would make estimation of hazard intensity from dispersal modelling or isopach maps unnecessary. In addition to taking a tephra thickness measurement at each site, deposit bulk density should also be measured as density is likely to vary from site to site based on the deposit's grainsize distribution, compactness, and degree of saturation. In areas with steep or multi-pitched roofs, observations of any tephra shedding should be made and ideally measurements should be taken from the roof itself. Taking such measurements would be time consuming and/or potentially dangerous and may therefore be unrealistic unless field teams are well trained and have sufficient personnel. Also, this approach requires a field team whose primary aim is to assess building damage, which is often not the case. If remote damage surveys are instead using measurements taken by teams focused on physical volcanology research, vulnerability assessment could be improved by attempting to only fit fragility curves using damage observations made within a set distance of robust tephra deposit measurements.

Conclusions
The February 2014 eruption of Kelud produced tephra falls that damaged thousands of buildings around the volcano. A total of 1154 buildings were remotely surveyed and damage was categorised into one of three damage states. Relatively few buildings experienced severe roof or building collapse (DS4/5), likely because nearby villages were not exposed to tephra deposits > 10.3 cm thick, with an equivalent dry deposit load of 144 kg m − 2 . We found that DS4/5 damage occurred at a minimum tephra thickness of 3.3 and 6.2 cm for the inverted and interpolated hazard layers, respectively. Data from the damage survey were used to produce new fragility curves. Their prediction accuracy was assessed and found to be only slightly lower than that of comparable fragility curves produced using field-based damage surveys for tsunami hazards. Our study highlights that the choice of interpolation method for tephra thickness field measurements influences the results of vulnerability assessment, which can then propagate into subsequent impact and risk assessment. This uncertainty can be reduced in future studies by taking a higher number of tephra deposit measurements and samples, ideally at the site of each damage observation as well as on the roof of the building if it is safe to do so. Of course, detailed field measurements such as these may be difficult to take in the immediate aftermath of an eruption and cannot be taken when damage surveys are conducted remotely months to years later. The opposing strengths and weaknesses of remote damage assessment and traditional, field-based damage surveys make these two approaches highly complementary to each other in efforts to deepen our understanding of tephra fall building vulnerability.

Appendix
All fragility curves in this study were fit using a cumulative link model (CLM) with the following form: where the probability (P) to equal or exceed a given damage state (ds j ) is expressed in terms of a single hazard intensity metric (HIM), which in this study is tephra loading, measured in kg m − 2 . Fragility curves in this study use the probit link function, whose inverse is the Table 4 Curve parameters for fragility curves fit using CLMs b ðβ j andβ 2 ) and the parameters required to produce the same fragility curves using Microsoft Excel's NORM.DIST function