SOC579 ePortfolio: A regression model for Iraq mortality

Published: 2021 July 15Modified: 2021 December 08, 23:20:45More details

There are nineteen governorates in Iraq, clustered into four zones: the Shia Arab south; the Sunni Arab center; the semi-autonomous Kurdistan Region; and the capital city of Baghdad. In addition, the Kurdistan Region and the federal Iraqi government collide in what may be considered a fifth zone: the disputed territories in Nineveh, Kirkuk, and Diyala governorates.

Although the Iraqi government’s Central Statistical Organization has released an array of statistical reports after the overthrow of the Saddam regime in 2003, this data has been seldom used for additional research in published academic research. As a result, many basic questions remain about the ways that various categorical and continuous variables relate to each other.

This report reviewed research on Iraq that has used demographic approaches, especially in cases where fundamental principles of demography have been adapted to the Iraq context. Building on these approaches and frameworks, data on mortality rates, household income, and urbanization was explored on a per-governorate basis, using tables from Iraq's Annual Abstract of Statistics for 2012. As part of this exploratory analysis, a tool was developed to create chloropleth maps of Iraq at the governorate level. Finally, a regression analysis was conducted to examine which indicator variables and regression models have a statistically significant relationship to mortality rates.

Materials and Methods

The methodology included steps for: data acquisition; data processing; mapping; and regression modeling. Each of these steps posed unique challenges. Data acquisition required combing through a deeply disorganized governmental website, and constructing a categorical indicator variable for the religious and ethnic identities of governorates. Mapping involved developing a visualization tool, when no such tool existed. Regression modeling was in many ways the easiest step, but required proper analysis, including interpretation of what may impact the regression model in different governorates.

Data Acquisition

The first step was to access the data and conduct some minor processing. The data was accessed from the Iraq government’s website for the Central Statistical Organization, where its Annual Abstract of Statistics for 2012 is available (albeit split up into many different pages). Three tables were used. The first table provided deaths per governorate and gender in 2009 and 2010 (CSO, 2012, Section 10, Table 11). The figures of interest were total deaths per governorate. The second table provided population by governorates and social origin in 2009 (Section 2, Table 6). Social origin was decomposed into two categories: rural and urban. These figures were not from a full population census, but rather from multiple sources including local population lists. The third table provided household income by source for 2007 (Section 15, Table 47). In Iraq, each household is headed by a patriarch, who is accompanied by a wife (or wives) and unmarried children. Also, there may be other dependents such as elderly or widowed relatives. Generally, when a child is married then they go to establish their own household. However, sometimes married children may remain in their parents’ household. The following categories of household income were reported: public employment; private employment; property income; social salaries, e.g. disability and retirement; and transfers, e.g. remittances.

Data Processing

To control for population size, deaths were converted into a rate of deaths per thousand. To obtain smoother results, the death rates for 2009 and 2010 were averaged together. Also, urban populations were converted into percentages of overall population. Because rural percentages would merely convey the difference, these were omitted from the regression model.
Household income was already reported on an average household basis, and did not require controlling. However, so many categories of income risked obfuscating the statistical messages in the regression model. The focus rested on public employment as a proportion of overall income, because this may indicate the strength of the state apparatus, as well as the impact of a stable salary, especially in contradistinction to income from private employment or self-employment. Other types of income may have had interesting statistical messages, but were not included. Social salaries (i.e. disability and pension payments) were not included, in part because they perhaps reflected the age of the population more than anything else, and also because they were a small proportion of overall household income. Property income was also not included, especially because it was fairly consistent across the governorates, and was not a large proportion of income anywhere. Transfer income was also very small, and did not make it into the modeling.

The three Kurdistan Region governorates of Erbil, Sulaymaniyah, and Duhok were removed because not all columns were available. The remaining governorates were labeled as either Sunni, Shia, or Baghdad. This reflected the major zones of Arab Iraq. Because of disparate experiences of conflict and development, it would be interesting to see if these distributions were statistically significant in a regression model as categorical indicators. In the figure above, the zones are shown with the Kurdistan Region in gray, the Sunni Arab areas (anchored by Anbar, the largest governorate) in dark blue, the Shia Arab areas (anchored by the large Muthanna and pointy Basrah governorates in the far south) in light blue, and Baghdad the darkest. Governorates with large areas bordering the Kurdistan Region (Nineveh, Kirkuk, and Diyala) also include some disputed territories.

Mapping with Ugiyot

In the literature review, it was discovered that maps seem to be generated on an ad hoc basis; there was not a single major resource that emerged for consistent, quality maps. This is in contrast to the plethora of online tools for mapping out data in American states, even within Google Sheets itself. To resolve this, I developed a minimum viable product that can generate Iraq-oriented maps, and called it Ugiyot based on the Hebrew word ‘עוגיות ugiyot’ which means ‘cookies’ (Clancy, 2021). The name comes from the fact that map boundaries are stored on the server, but the user can enter color coding and other information which is stored as client-side cookies. It was developed using PHP for programmatic steps, and HTML for visual output. Within the HTML, a map of Iraq was rendered as an SVG based on coordinates.

The drawings of administrative boundaries were sourced from MAPSVG (2021) and its repertoire of publicly available illustrations released under the CC4.0 license. It was difficult to call them true maps, because rather than plot out the boundaries using independent GPS coordinates, MAPSVG provided each governorate (polygon) using a height- and width-dependent starting point, followed by a path which was drawn relative to each previous point. For example, if a 4x4 square were being drawn in the center of a 16x16 plot, the MAPSVG coordinates would be (6, 6), (4, 0), (0, 4), (-4, 0). This may have worked visually, but it severely limited interoperability and made revisions to boundaries dependent on complicated recalculations. However, since this report was prepared within a limited time frame, it did not make sense to invest hours into extracting GPS coordinates from other sources for the initial step of a minimum viable product. Instead, the clean lists of coordinates in the MAPSVG file were simply converted to independent points, i.e. (6,6), (10,6), (10,10), (6, 10). This inelegant but effective conversion step means that from the beginning, Ugiyot is able to accept any sets of coordinates, including (in the future) GPS coordinates. By focusing on a system that takes coordinates and handles scaling and centering in a programmatic manner, Ugiyot can also illustrate alternative geographies, such as older administrative maps which may be visually clear in what they reference, may have enough distortions and inaccuracies that they cannot really be translated into a GPS system. The final code, corresponding geographic boundary data, and basic instructions are available on Github (Clancy, 2021). Development into a more robust product may be an area of focus in capstone research.

Regression Modeling

With processing complete for the population data, and development complete on a map visualization tool, the data was saved as a CSV and uploaded to R Studio for regression modeling. The R code and the outputs have been included as appendices. For each of the regression models, attention was paid to the statistical significance of various coefficients, and the magnitude of the R-squared values. For the response variable of mortality rate, the following indicator variables were examined: proportion of population in urban settings; proportion of average household income from public employment; region (i.e. Shia Arab, Sunni Arab, or Baghdad).

Results

Results included chloropleth maps, regression model summaries, and the accompanying interpretations. When initially looking at the results, Baghdad was noticeably tricky to include, so several steps were repeated without Baghdad. As a geographic boundary, Baghdad barely encompasses the city itself, resulting in an extraordinarily high urbanization rate. Also, it has an extremely high number of deaths. This sometimes exerted problematic influence on the shading of chloropleth maps and the overall statistical significance of a regression model based on R-squared values alone, because the residual for Baghdad was so massive in a model fitted to other data points excluding Baghdad. Also, as mentioned in the data processing step, the Kurdistan Region was removed due to incomplete data. Removed governorates are displayed in gray.

Exploratory Analysis

To begin with, an exploratory analysis was conducted of population totals. However, the very high population totals in Baghdad (center) and Nineveh (northwest) compressed the color spread for the remaining governorates, so these were removed for a more natural visual analysis. Population totals per governorate are in many ways a result of how governorates are drawn rather than organic ethnoreligious variation, but it was immediately apparent that the Sunni Arab center and the Shia Arab south were distinct whether due to genuine population distributions or administrative manipulation of the boundaries.

When looking at total income, the map seems to follow Sunni and Shia distinctions. Notably, Diyala (in the east) is noticeably low. This is important in relation to the mortality map, where Diyala was fairly high. Also, Najaf is very high in average household income compared to the other governorates of Iraq. It may have been expected that Baghdad would be particularly high, as a very urban city at the economic heart of the country. However, it appears that wealth is in fact being shifted to Najaf, where a prominent political and religious class is based in the Shia heartland. This may reflect corruption in the country. Interestingly, Najaf had a fairly high mortality rate despite this high concentration of income.

Next, exploratory analysis was conducted of the governorates according to the average household’s proportion of income from government jobs. Interestingly, Najaf and Diyala seem to have switched places. Najaf has high income, but from a lower proportion of public employment. Diyala has low income, but from a higher proportion of public employment. In general, it seems that there seems to be a negative relationship between public employment and income. This may be due to the unique circumstances in Iraq, where the public sector has a very large role in the economy. According to Al-Mawlawi (2018), “government expenditure as a percentage of GDP averaged at 52 percent between 2005 and 2012, making it amongst the highest in the region” with almost 3 million public employees (para. 4). In this context, there may be a fairly binary choice within households between low but stable public sector income, versus high but unstable private sector income. Choosing one or the other does not necessarily mean a family is considered richer or poorer, as stability may have a premium in social standing.

Regression Analysis

Six different regression models were explored: there were three combinations of indicators; and each combination was examined with and without Baghdad. As seen in the exploratory analysis, Baghdad is a very tiny and very different type of governorate. Although Baghdad is a first-level administrative division, including Baghdad may cause problems for the same reason as Washington, DC in the United States. For example, Washington, DC may rank as an extraordinarily dense first-level division in the United States when, more meaningfully, its density should be compared to cities rather than states. By extension, it may make more sense to compare Baghdad to other governorate centers (e.g. Mosul city in Nineveh governorate, or Basrah city in Basrah governorate), as Baghdad can be described as the center of its own governorate.

Based on the three models examined, it appeared that for the indicators under consideration, the most significant statistical relationship for mortality rates was with urbanization and region as indicators. Also, it was clear that Baghdad was a very influential data point. Because our focus is on regression models as tools to examine connections, rather than for predictive power, it is advisable to omit Baghdad in general. We have no need to predict future mortality rates in a particular governorate such as Baghdad, but rather just want to see what factors are influencing mortality in general. Notably, Maysan seemed to defy trends, and had an unusually high residual in some models. Rather than discarding the overall statistical trends, it should be noted as an exception to the rule and studied further.

Death Rates Based on Urbanization and Public Employment

This regression model found that urbanization was a statistically significant predictor, but public sector employment was not. This was somewhat reflected in the maps, where mortality and urbanization both seemed to neatly follow regional divides but public sector employment was relatively scattered.

	Estimate	Std. error	t-value	Pr > \| t \|
Intercept	1.730	1.286	1.345	20.35%
Prop. urban	4.291	1.541	2.783	1.66%
Prop.public.	0.822	2.850	0.289	77.79%
Table 1: Regression model including Baghdad. The model had a 95% level of statistical significance.

Without Baghdad, the statistical significance of the percentage of household income from public employment increased slightly, but it was still statistically insignificant. However, the statistical significance of urbanization decreased significantly. This is likely because Baghdad has a very high rate of urbanization and also a very high rate of mortality, so it may overshadow the other governorates. Overall, the statistical significance without Baghdad was not particularly compelling.

	Estimate	Std. error	t-value	Pr > \| t \|
Intercept	2.226	1.487	1.496	16.27%
Prop. urban	3.695	1.784	2.71	6.27%
Prop. public	0.224	2.965	0.413	68.78%
Table 2: Regression model excluding Baghdad. The model had an 80% level of statistical significance.

All in all, this regression model indicates that urbanization is a meaningful indicator of mortality rates, and with Baghdad included we can say that to a 98% level of statistical confidence. In other words, as urbanization increases, so does the mortality rate.

Death Rates Based on Urbanization and Average Household Income

Looking at urbanization and average household income, there appears to be a statistically significant relationship between both variables and mortality to a 90% degree of statistical certainty. The relationship to urbanization exceeds 99% certainty.

	Estimate	Std. error	t-value	Pr > \| t \|
Intercept	0.673	1.108	0.6	99.53%
Prop. urban	4.623	1.271	3.195	0.77%
Total income	0.180	0.943	1.910	8.04%
Table 3: Regression model including Baghdad. The model had a 99% level of statistical significance.

Without Baghdad, the statistical relationships worsen for urbanization (like in the previous regression model) but improve slightly for total income. Notably, there is a significant decrease in the impact that incremental increases in urbanization have on mortality rates.

	Estimate	Std. error	t-value	Pr > \| t \|
Intercept	0.334	1.165	0.287	77.96%
Prop. urban	3.277	1.510	2.170	5.28%
Total income	0.190	0.952	2.2	7.06%
Table 4: Regression model excluding Baghdad. The model had a 95% level of statistical significance.

Although Baghdad is an unusual governorate and excluding it from exploratory analysis helped with bringing out patterns in other governorates, it appears that removing Baghdad from regression models does not result in any major improvements to models. However, the fact that including Baghdad has such a large influence on models is concerning.

Death Rates Based on Urbanization and Region

Using Baghdad as the baseline, a regression model was conducted with categorical variables for whether a governorate falls into a Shia Arab or Sunni Arab region. Apparently, being born outside of Baghdad incurs an improvement (decrease) in mortality rates. There does appear to be some statistical significance with regards to Sunni Arab regions, but no compelling statistical significance with regards to Shia Arab regions, at least with Baghdad included.

	Estimate	Std. error	t-value	Pr > \| t \|
Intercept	3.272	1.500	2.180	5.18%
Prop. urban	2.516	1.559	1.614	13.49%
Shia Arab	0.591	0.775	0.762	46.19%
Sunni Arab	0.377	0.861	0.600	13.79%
Table 5: Regression model including Baghdad. The model had a 98% level of statistical significance.

Taking a look at the residuals, we see that Baghdad has a very low residual, while the rest of the variables are several magnitudes larger. The figure to the right shows that not only is the residual for Baghdad very low, but the residual for Maysan is very high. Interestingly, the residuals otherwise seem to follow sectarian divisions, with the Sunni Arab center being a shade darker than the Shia Arab south, in general.

Without Baghdad, the model improves greatly. Overall, its statistical significance dips slightly to 97%, but a statistically meaningful relationship between mortality and region becomes apparent. Sunni Arab regions have a slight improvement (decrease) in mortality, as noticed before, with an impressive 94% level of statistical certainty. The statistical significance of urbanization is at 85%, which is not fantastic, but still important to note.

	Estimate	Std. error	t-value	Pr > \| t \|
Intercept	2.680	0.990	2.708	2.04%
Prop. urban	2.516	1.559	1.614	13.49%
Sunni Arab	0.786	0.372	0.112	5.84%
Table 6: Regression model excluding Baghdad. The model had a 97% level of statistical significance.

When looking at the figure without Baghdad, we seem to see the same overall chloropleth map. Importantly, the sectarian divisions still seem apparent, and Maysan still appears to be exceptional as a relatively low-mortality and high-urban governorate (in addition, it is low-income). Further inquiry into Maysan’s governorate data is essential here to better understand its surprising results which seem to defy the trends.

Conclusion

Rather than one conclusive finding, a range of important findings were found to a reasonable degree of certainty. First of all, chloropleth maps used in the exploratory analysis found again and again that sectarian divisions in Iraq are also reflected in the demographic reality of the country’s governorates. This is notable, because it shows that sectarian divisions really do influence the demographic reality. This raised important questions about the future of the country. Peacebuilding messages about unity may be hindered by everyday people’s strong recognition (or painful ignorance) of the disparities between life in Sunni and Shia zones. Further research should be undertaken to explore whether these disparities between the governorates are influenced by gerrymandering of governorate boundaries, especially since the Sunni governorates seem to be larger than the Shia governorates.

In addition, a significant finding was that Baghdad is an influential governorate and that it may not be appropriate to compare it to other governorates, for the same reason that it may be faulty to compare rates between states and Washington, DC, even though both are technically administrative divisions at the same level. Considering that analyses generally treat Baghdad as just another governorate, establishing its separation as a best practice may be a major improvement in modeling and research. Further research should be undertaken whether the very high mortality rate in Baghdad is accurate, or if it is a bookkeeping issue. It is possible that people may live outside the city, perhaps even in a nearby suburb, but register death certificates in the city proper within its tiny governorate boundaries. When looking at regression models, the influence of Baghdad was notable enough that removing this one data point had a significant impact on overall statistical certainty and R-squared values overall. Removing Baghdad meant that modeling hinged more at residuals of less influential data points, rather than conveying an inflated statistical certainty by reducing the residual of one very influential point.

Another major conclusion was simply that many questions remain. The high household income in Najaf was surprising, and should be interrogated further. It is possible that this has to do with Shia political parties being seated in Najaf, and the flow of wealth into the province as a sort of secondary capital in that sense. Also, the role of religious tourism may be influential, although Karbala did not seem to benefit in the same way. Another surprise dealt with Maysan, which seemed highly urbanized but low in mortality, defying overall trends. The next step of this research would be to examine the composition of urbanization: is the governorate in fact very rural, with one city that inflates the overall rate of urbanization? There are many other avenues for exploration.

Lastly, a significant conclusion was that a need exists for better research tools on Iraq. When trying to find solutions for creating chloropleth maps of Iraq, it was discovered that such tools either do not exist or are not easily accessible. Also, the conventionally available static maps do not reflect the latest boundaries between the Kurdistan Region and Iraq. Having maps with up-to-date boundaries and GPS-based coordinates, with user-configurable chloropleth tools, could be of tremendous importance for mapping or other research. For example, coordinates of government hospitals could be processed as point-in-polygon queries (where the polygons are governorates) for quick tallies of hospitals within each governorate, district, or other spatial boundary. In addition, GPS-based maps could be more easily updated based on fieldwork to identify locations of recent checkpoints between federally administered areas and the autonomous Kurdish areas. Further development of Ugiyot, which was created to support this report but is now available to other researchers, may make a significant contribution to the humanitarian sector in particular where mapping is of critical importance.

All in all, as shown in the literature review, Iraq seems to be at the fringes of demographic research due to a focus on specific conflict issues, a paucity of data, unreliable data, and frequent changes which impede predictive models or identification of statistical relationships. However, there seem to be durable clusters of regions with different demographic characteristics. The many changes in the country highlight, rather than diminish, the significance of the clusters because they seem to endure. By better understanding the real, lived inequalities of everyday Iraqis, then true peacebuilding and reconciliation can be premised on authentic understandings of how two people may share the same citizenship and even the same socioeconomic status but nonetheless live (and struggle) in very different environment.

Reflection

The final status of the project met expectations: mapping several important variables by governorate, and exploring the statistical relationship between mortality rates and a few potential indicators. However, even a fairly modest and generic approach may be overly ambitious when dealing with a country wracked by tremendous turmoil and instability. Along the way, several setbacks emerged. First of all, there was very limited data available. Also, the typical visualization tools available for research into the United States were not available for Iraq. Lastly, there were few demographic frameworks which scholars had adapted to the Iraq context.

Thankfully, there was a sense of excitement that I felt with these challenges. For more than a year, I have been systematically collecting data from online and offline sources, meaning that despite some shortcomings — for example, the data was over a decade old — I was able to piece together something suitable for basic exploratory analysis and regression modeling. Also, with some development expertise, I was able to code a tool that develops chloropleth maps based on user input. This tool — nicknamed Ugiyot — is now available for other researchers to use. In addition, I was able to take the most basic demographic frameworks that have been established — i.e., that sectarian divisions exist in Iraq — and find these compellingly displayed in the analysis.

What I learned most is that good products do not necessarily need particularly creative questions. I approached the data and literature with simple questions, but emerged with a great visualization tool and several strong findings. As someone who has struggled with the lack of mapping resources, I think that developing these resources and making them freely available may be a significant help for people who are composing reports. I look forward to publishing the ePortfolio online and posting the types of results I was able to obtain, to encourage people to think of me as a potential consultant and to encourage adoption of Ugiyot as it becomes a more mature web application.

Endnotes

Abu-Sittah, G. S. (2019). The Political Capital of War Wounds. In C. Lutz & A. Mazzarino (Eds.), War and Health: The Medical Consequences of the Wars in Iraq and Afghanistan (Vol. 4, pp. 137–151). NYU Press. http://www.jstor.org/stable/j.ctv1jhvns4.9

Al-Khalisi, N. (2013). The Iraqi Medical Brain Drain: A Cross-Sectional Study. International Journal of Health Services, 43(2), 363–378. http://www.jstor.org/stable/45131937

Al-Mawlawi, A. (2018). Analysing Growth Trends in Public Sector Employment in Iraq. London School of Economics: Middle East Centre Blog. https://blogs.lse.ac.uk/mec/2018/07/31/analysi ng-growth-trends-in-public-sector-employment-in-iraq/

Ali, M. M., Blacker, J., & Jones, G. (2003). Annual Mortality Rates and Excess Deaths of Children under Five in Iraq, 1991-98. Population Studies, 57(2), 217–226. http://www.jstor.org/stable/3595749

Alshamary, M., Al-Amin, S. (2018, September 14).Who to blame for the protests in Basra, Iraq? The Washington Post. https://www.washingtonpost.com/news/monkey-cage/wp/2018/09/14/wh o-to-blame-for-the-protests-in-basra-iraq/

Barringer, F. (2003, August 8). After the War: Population Puzzle; Fewer Iraqi Men: Dead or Undercounted? The New York Times. https://www.nytimes.com/2003/08/08/world/after-the -war-populatin-puzzle-fewer-iraqi-men-dead-or-undercounted.html

Central Statistical Organization of Iraq. (2012). Annual Abstract of Statistics for 2012. http://cosit.gov.iq/AAS/AAS2012/

Cetorelli, V. (2014). The Effect on Fertility of the 2003–2011 War in Iraq. Population and Development Review, 40(4), 581–604. http://www.jstor.org/stable/24638492

Clancy, L. M. (2021). Ugiyot (Version 1.0.0) [Computer software]. https://github.com/levimeirclancy/ugiyot-blue

Cordesman, A. H., & Khazai, S. (2014). Violence in Iraq: The Growing Risk of Serious Civil Conflict. In Iraq in Crisis (pp. 11–34). Center for Strategic and International Studies (CSIS). http://www.jstor.org/stable/resrep36981.6

Fabian, K. P. (2002). America’s Policy Towards Iraq. India Quarterly, 58(2), 1–14. http://www.jstor.org/stable/45073490

Fayyad, H. N. (2012). Fertility in Iraq: Trends, Evolution and Influential Factors. Arab Center for Research & Policy Studies. http://www.jstor.org/stable/resrep12660

Field, J. O. (1993). From Food Security to Food Insecurity: The Case of Iraq, 1990-91. GeoJournal, 30(2), 185–194. http://www.jstor.org/stable/41145741

Garfield, R. (2000). The Public Health Impact of Sanctions: Contrasting Responses of Iraq and Cuba. Middle East Report, 215, 16–19. https://doi.org/10.2307/1520149

Harding, S., & Libal, K. (2019). War and the Public Health Disaster in Iraq. In C. Lutz & A. Mazzarino (Eds.), War and Health: The Medical Consequences of the Wars in Iraq and Afghanistan (Vol. 4, pp. 111–136). NYU Press. http://www.jstor.org/stable/j.ctv1jhvns4.8

Kaiser, J. (2007). Iraq Mortality Study Authors Release Data, but Only to Some. Science, 316(5823), 355–355. http://www.jstor.org/stable/20036032

Lal, V. (2006). The Dead in Iraq and the War of Numbers. Economic and Political Weekly, 41(49), 5028–5029. http://www.jstor.org/stable/4418993

MAPSVG. (2021). SVG vector map of Iraq [SVG]. http://mapsvg.com/maps/iraq
Sim, M. R. (2009). Mortality and Cancer from Chemical Weapons Testing. BMJ: British Medical Journal, 338(7697), 725–726. http://www.jstor.org/stable/20512449

Taback, N. (2007). Mortality of journalists in Iraq. Medicine, Conflict and Survival, 23(2), 147–148. https://www.jstor.org/stable/27017361