The Role of Smoking in Shaping Data Science: A Historical Perspective
Written on
Chapter 1: The Influence of Smoking on Medical Research
The popular TV series Mad Men serves as a reminder of how deeply intertwined smoking was with American culture, particularly in the 1960s. During this time, the medical community was gradually acknowledging the detrimental health effects of smoking, notably its link to cancer, despite many professionals being habitual smokers themselves.
Fast forward to today, and the number of habitual smokers has significantly decreased. While we still face challenges related to smoking, the past five decades have brought about notable changes. This begs the question: why does progress take so long?
One critical factor is the addictive nature of smoking, which makes changing such behavior particularly difficult. Moreover, it requires extensive research to establish a causal relationship between smoking and adverse health outcomes, such as cancer.
For those of us in data science, the extensive research on smoking and its health repercussions has been immensely beneficial. This body of work spurred the development of epidemiological analytics, which remain invaluable today. The battle between the medical field and tobacco companies from the 1960s to the 1980s led to significant advancements in survival analysis.
Section 1.1: Understanding Survival Analysis
Survival analysis has emerged as a crucial aspect of medical research, primarily focused on understanding factors that influence survival rates. Prior to the mid-20th century, most studies centered on acute diseases that posed immediate threats to health, such as bacterial or viral infections. The devastating Spanish Flu pandemic of 1918-1920, which claimed millions of lives, was a pivotal moment that propelled early epidemiological investigations.
As antibiotics became widely available in the 1950s, the focus shifted towards chronic diseases like cancer, which posed long-term health risks. This transition necessitated longitudinal studies to monitor individuals over extended periods, allowing researchers to identify lifestyle factors that might contribute to severe diseases. This marked a significant methodological shift, resulting in many of the large-scale studies we now see reported in the media.
Section 1.2: The Mechanics of Survival Curves
Consider a scenario where you hypothesize that a specific aspect of an individual’s experience within an organization influences their likelihood of remaining with that organization over time. For instance, you might suspect that a positive work environment fosters long-term employee retention, while a negative atmosphere leads to higher turnover.
This experience could be analyzed similarly to survival in health studies, allowing you to track individuals’ attrition rates over time based on their experiences. Kaplan-Meier survival curves provide a clear visual representation of these findings.
The graph below, which depicts survival rates for smokers and non-smokers, exemplifies this method. The x-axis represents time in months, while the y-axis indicates the proportion of individuals still alive at various time points.
Chapter 2: Practical Applications of Survival Analysis
In practice, survival analysis offers numerous applications that can enhance organizational practices:
- Survey Validation: By demonstrating that survey responses correlate with attrition rates, management can better appreciate the significance of these surveys.
- Predictive Analytics: Survival analysis can validate specific indicators of attrition, enhancing broader predictive models. Research has shown that language used in emails can predict an employee’s cultural fit within an organization.
- Promoting Diversity: This analysis can also highlight organizational patterns, such as the utilization of diverse employee backgrounds for different tasks, thus providing a statistical basis for promoting inclusivity.
Survival analysis serves as a powerful tool for examining human outcomes, often requiring only basic data like survey responses or participation records. Many organizations could greatly benefit from utilizing survival analysis to gain insights into the true drivers of their people-related outcomes.
Feel free to share your thoughts on this article in the comments section below.
This video discusses using machine learning to predict smoking status and provides insights into leveraging tools like ChatGPT for success in data science projects.
In this podcast episode, the effects of nicotine on the brain and body are explored, along with practical advice on how to quit smoking or vaping.