Vem Ă€r Vem? is a Swedish biographical encyclopedia that was published in two editions of five volumes each in 1945â1950 and 1962â1968 by Bokförlaget Vem Ă€r Vem.
The intention was, according to the publishers, to draw attention to people who were at the height of their activities, even if they were younger, in influential or otherwise noted positions in different areas.
Biographies and career trajectories of ~ 75,000 individuals!
8 of the 10 volumes are digitized by librarians in Uppsala â thank you <3
Vem Àr Vem? Volumes
Vem Àr Vem? Example page
Vem Àr Karl Lund?
Lund, Karl Gustaf, chief engineer, born on July 22, 1893, in Hille, GĂ€vle County, Sweden, son of clerk Ferdinand L. and Maria Andersson. Married in 1936 to Sigrid Johansson. Children: Ingvar (born 1938), Lennart (born 1942). â Graduated from Bergsskolan in Filipstad in 1917, specialized studies at the Royal Institute of Technology (KTH) from 1920 to 1922, studied at the Institute of Metallurgy and Stockholm University in 1921-1922. Chemist at StrömsnĂ€s JĂ€rnverks A-B in Degerfors from 1918 to 1920, metallurgist and chemist at Westinghouse Electric & Manufacturing Co. in East Pittsburgh, PA, USA, from 1923 to 1926 and 1928 to 1929, chief metallurgist at Laclede Steel Co. in Alton, Illinois, USA, in 1927, furnace and steelworks engineer at A-B Iggesunds Bruk from 1929 to 1931, site manager at Gunnebo Bruks Nya A-B, Varbergsverket, since 1931. Member of the municipal executive committee, deputy chairman of the economic department, deputy member of the board of the power plant, chairman of Varbergs Sparbank, employer representative in the district council of the county labor board, member of the board of Varbergs Luftskyddsförening (Varberg Air Protection Association), secretary of Varbergs Högerförening (Varberg Conservative Association), chairman of the railway sick fund, and Plant Society for Small Bird Friends.
Vem Àr Karl Lund?
Traveled to Germany in 1921, 1923, 1930, and 1936, Denmark, Czechoslovakia in 1921, 1922, and 1923, Austria in 1921, and the USA from 1923 to 1929. Writings: âSome fundamental factors for obtaining sharp thermal curvesâ (Transactions of the American Society for Steel Treating, co-authored with C. Benedicks and W. H. Dearden, 1925), âContemporary production of saw blades, circular saws, and machine knivesâ (Timber Industry, 1931). Hobbies: hunting and fishing.
How can we structure the data?
NLP challenges
Many abbreviations and contractions
DOB: âf.\s*(\d{2})\/(\d{2})/(\d{2})â
GĂ€vleborg County: âGĂ€vleb. l.â
Similar structure for each entry but not exactly the same information in the same order
Data pipeline
Step
Process
1
Scrape book data from website
đâĄïžđ
2
Split records on each page of a book
đâĄïžđ
3
Structure biographies using schema
đđđ
4
Classify record into group
đđđđđ
5
Store data for analysis
đđđ
20,000 biographies in 9 volumes of Vem Àr Vem? and 2,400 firms in Svensk Industrikalender in a machine readable format for ~ 80 USD.
Output
{"original": "Lund, Karl Gustaf, överingenjör, Varberg, f. 22/7/93 i Hille, GĂ€vleb. 1., av brukstj :m. Ferdinand L. o. Maria Andersson. G. 36 m. Sigrid Johansson. Barn: Ingvar f. 38, Lennart 42. â Ex. v. bergssk. i Filipstad 17, spec:-stud. v. KTH (B) 20-22, stud. v. me-tallogr. inst. o. Sthlms högsk. 21-22. Kemist v. StrömsnĂ€s JĂ€rnverks A-B, Degerfors, 18-20, metallurg o. kemist v. Westinghouse Electric & Manuf. Co., East Pittsburgh, Pa, USA, 23-26 o. 28-29, chefsmetallurg v. Laclede Steel^Co., Ahon, 111., USA, 27, hytt-o. stĂ„lv.ing. v. A-B Iggesunds Bruk 29-31, platschef v. Gunnebo Bruks Nya A-B, Varbergsverket, sed. 31. Led. av drĂ€tselkamm., v. ordf. v. ekonomi-avd., suppl. i styr. f. elverket, huv:-man i Varbergs Sparbank, arb :giv. repr. i lĂ€nsarb:ndns kretsrĂ„d, led. av styr. f. Varbergs luftsk ifören., sekr. i Varbergs högerfören., ordfs i jĂ€rnv. sjukkassa o. Plant :sĂ€llsk. SmĂ„fĂ„gl. VĂ€nner. Res. t. Tyskl. 21, 2^ 23, 30 o. 36, Danm., Tjeckoslov. 21, 22, 23, Ăsterr. 21, USA 23-29. Skr.: Some fundamental factors for obtaining sharp thermal curves (TrĂ€ns. Am. Soc. for Steel Treating, tills. m. C. Benedicks o. W. H. Dearden 25), Nutida fabrikation av sĂ„gblad, sĂ„gklingor o. maskinknivar (TrĂ€varuind. 31). Hob-bies: jakt o. fiske.","translated": "Lund, Karl Gustaf, chief engineer, Varberg, born on July 22, 1893, in Hille, GĂ€vle County, Sweden, son of factory worker Ferdinand L. and Maria Andersson. Married in 1936 to Sigrid Johansson. Children: Ingvar (born 1938), Lennart (born 1942). â Graduated from Bergsskolan in Filipstad in 1917, specialized studies at the Royal Institute of Technology (KTH) from 1920 to 1922, studied at the Institute of Metallurgy and Stockholm University in 1921-1922. Chemist at StrömsnĂ€s JĂ€rnverks A-B in Degerfors from 1918 to 1920.","structured": {"@context": "https://schema.org","@type": "Person","name": "Karl Gustaf Lund","birthDate": "1893-07-22","birthPlace": {"name": "Hille, GĂ€vle County, Sweden","latitude": 59.916667,"longitude": 15.0 },"jobTitle": "Chief Engineer","memberOf": {"@type": "Organization","name": "Swedish Technical Association" },"children": [ {"@type": "Person","name": "Anita","birthDate": "1937" }, {"@type": "Person","name": "Peter","birthDate": "1942" } ],"spouse": {"@type": "Person","name": "Marianne Hammarberg","marriageDate": "1936" },"education": [ {"@type": "EducationalOccupationalCredential","credentialCategory": "Vocational","issuer": {"@type": "Organization","name": "Technical Gymnasium in Ărebro" },"endDate": "1928" }, {"@type": "EducationalOccupationalCredential","credentialCategory": "Degree","issuer": {"@type": "Organization","name": "KTH" },"endDate": "1934" } ],"worksFor": {"@type": "Organization","name": "Telegraph station in Norrköping","startDate": "1946" },"hasOccupation": [ {"@type": "Occupation","name": "Engineering assistant","employer": {"@type": "Organization","name": "Telegraph station in NĂ€ssjö" },"startDate": "1935","endDate": "1935" }, {"@type": "Occupation","name": "Line engineer","employer": {"@type": "Organization","name": "Telegraph station in Norrköping" },"startDate": "1946" } ] }}
How can we classify occupations according to a schema?
Two-part classification process for engineers
Makes use of occupational title, e.g. âCivilingenjör, teknisk fysikâ and workplace information, e.g. âStrömsnĂ€s JĂ€rnverks A-Bâ to classify individuals into groups.
Are engineers immigrants, or from very rural areas more than others?
Just 3.6% of engineers were born outside of Sweden, not significantly different from other occupations.
Of Swedish born, priests come from places with low population density.
No statistical differences in population density between the birthplaces of doctors, general managers, dentists, and engineers.
Engineers are born in parishes with early access to electricity
Where did engineers study?
Did engineers get advanced degrees?
Engineers in electrical appliances and machinery move the furthest from birthplace to study
Engineers in electrical appliances and machinery live further from their birthplace than other occupations
Engineers live further from their birthplace than other occupations
Conclusion
Bifurcated labour market in Sweden; middle-skill workers stay put, high-skill workers move to opportunity both for education and work.
What to do next
Intergenerational mobility: did engineersâ parents have similar jobs to doctors, dentists, businessmen?
References
Bandiera, Oriana, Andrea Prat, Stephen Hansen, and Raffaella Sadun. 2020. âCEO Behavior and Firm Performance.âJournal of Political Economy 128 (4): 1325â69. https://doi.org/10.1086/705331.
Dahl, Christian MĂžller, and Christian Vedel. 2023. âBreaking the HISCO Barrier: AI and Occupational Data Standardization.â Odense, Denmark: University of Southern Denmark.
Ford, Nicholas Martin, Kristin Ranestad, and Paul Sharp. 2023. âNot the Best Fillers in of Forms? The Danish and Norwegian Graduate Biographies and âUpper Tail Knowledgeâ.â Working Papers 0242. European Historical Economics Society (EHES). https://ideas.repec.org/p/hes/wpaper/0242.html.
Goldin, Claudia. 1994. âLabor Markets in the Twentieth Century.âNational Bureau of Economic Research. https://doi.org/10.3386/h0058.
Merouani, Youssouf. 2023. âInnovation and Gender During Industrialization: The Case of Patenting Activity in France 1791-1913.â Midway Seminar, Economic History Department, LUSEM, Lund University. https://youssoufmerouani.com/talks/midway_seminar.html#/title-slide.
Mokyr, Joel. 2017. âA Culture of Growth: The Origins of the Modern Economy.âPrinceton University Press.
Moretti, Enrico. 2012. The New Geography of Jobs. Houghton Mifflin Harcourt.
Nekoei, Arash, and Fabian Sinn. 2020. âHERSTORY the Rise of Self-Made Women.âSSRN Electronic Journal, December. https://doi.org/10.2139/ssrn.3741332.
Appendix
How to get a text embedding?
# Define a function to get the text embeddingdef get_embedding(text, model="text-embedding-ada-002"): text = text.replace("\n", " ")return openai.Embedding.create(input=[text], model=model)['data'][0]['embedding']# Apply the get_embedding function to your 'hisco_text' columndf['ada_embedding'] = df['hisco_text'].apply(lambda x: get_embedding(x, model='text-embedding-ada-002'))
Assign HISCOs: example
026 Metallurgists (HISCO description)
Workers in this unit group advise on metallurgical problems and develop and control processes for the extraction of metals from their ores, study the properties and engineering characteristics of metals and develop new alloys, and develop and supervise metal manufacturing processes for making finished metal products.
Karl Lund in 1923 (Input)
Metallurgist and Chemist at Westinghouse Electric & Manufacturing Co.
Assign HISCOs: codes projected from 1536 dimensions to 2 with UMAP
Assign HISCOs: Watchmaker Apprentice
Closest HISCO codes to Watchmaker Apprentice
Rank
HISCO CODE
Occupational Description
1
842
Watch, Clock and Precision Instrument Makers
2
830
Blacksmiths, Toolmakers and Machine-Tool Operators
3
810
Woodworkers
4
811
Cabinetmakers
5
839
Blacksmiths, Toolmakers and Machine-Tool Operators
Assign HISCOs: Shipyard Assistant Engineer
Closest HISCO Codes to âShipyard Assistant Engineerâ
Rank
HISCO CODE
Occupational Description
1
43
Ships' Engineers
2
24
Mechanical Engineer
3
982
Ships' Engine-Room Workers
4
42
Ships' Deck Officers and Pilots
5
959
Construction Workers
Keep titles in Swedish?
Instead of using the HISCO schema in English, I am using the SSYK classification from 2012.
Instead of using the multilingual OpenAI ADA model, I now use The National Library of Sweden Swedish BERT model. It is trained on 200M sentences, 3000M tokens from books, news, government publications, swedish wikipedia and internet forums.
The input is from the Job Tech Data so we have a ground truth that they have labelled.
SSYK and Swedish BERT examples:
Closest SSYK Codes to âAmbulanssjukvĂ„rdareâ (Code: 5326)
Rank
Closest Match Title and Code
1
5326 - AmbulanssjukvÄrdare
2
2226 - Ambulanssjuksköterskor m.fl.
3
2227 - Geriatriksjuksköterskor
4
5112 - TÄgvÀrdar och ombordansvariga m.fl.
5
2231 - Operationssjuksköterskor
SSYK and Swedish BERT examples:
Closest Matches to âProcessingenjör, kemiteknik (Code: 2512)â
Rank
Closest Match Title and Code
1
3115 - Ingenjörer och tekniker inom kemi och kemiteknik
2
2145 - Civilingenjörsyrken inom kemi och kemiteknik
3
3113 - Ingenjörer och tekniker inom elektroteknik
4
7321 - Prepresstekniker
5
2163 - Planeringsarkitekter m.fl.
Can we score how successful this is?
We can use a crosswalk from one schema to another as a test set, e.g. O * Net to ISCO:
Vem Àr Karl Lund?
Lund, Karl Gustaf, chief engineer, born on July 22, 1893, in Hille, GĂ€vleborgs lĂ€n, Sweden, son of clerk Ferdinand L. and Maria Andersson. Married in 1936 to Sigrid Johansson. Children: Ingvar (born 1938), Lennart (born 1942). â Graduated from Bergsskolan in Filipstad in 1917, specialized studies at the Royal Institute of Technology (KTH) from 1920 to 1922, studied at the Institute of Metallurgy and Stockholm University in 1921-1922. Chemist at StrömsnĂ€s JĂ€rnverks A-B in Degerfors from 1918 to 1920, metallurgist and chemist at Westinghouse Electric & Manufacturing Co. in East Pittsburgh, PA, USA, from 1923 to 1926 and 1928 to 1929, chief metallurgist at Laclede Steel Co. in Alton, Illinois, USA, in 1927, furnace and steelworks engineer at A-B Iggesunds Bruk from 1929 to 1931, site manager at Gunnebo Bruks Nya A-B, Varbergsverket, since 1931. Member of the municipal executive committee, deputy chairman of the economic department, deputy member of the board of the power plant, chairman of Varbergs Sparbank, employer representative in the district council of the county labor board, member of the board of Varbergs Luftskyddsförening (Varberg Air Protection Association), secretary of Varbergs Högerförening (Varberg Conservative Association), chairman of the railway sick fund, and Plant Society for Small Bird Friends.
NLP challenges
Many abbreviations and contractions
DOB: âf.\s*(\d{2})\/(\d{2})/(\d{2})â
GĂ€vleborg County: âGĂ€vleb. l.â
Similar structure for each entry but not exactly the same information in the same order
Structure using OpenAI API đ€
Using OpenAIâs GPT-3.5 chat API:
We provide the system a schema to structure the record in, e.g. keys.
def structure_biography_info(page_text):try:# Create a prompt for the system structure_prompt =f"Task: read the schema and return RFC compliant JSON information about the Swedish individuals from the 1950 biographical dictionary 'Vem Àr Vem' that is provided below. Use a numeric index for each biography in your JSON output and return information about all of them, including all career information available. Keep the biographic descriptions in Swedish and remove any abbreviations based on your knowledge, e.g. 'fil. kand.' is 'filosofie kandidat', and 'Skarab. l.' is 'Skaraborgs LÀn'. Put years in full based on context. Put dates in dd/mm/yyyy format where possible. If there is no information for a key, leave it out. If there is no information for a required key, put NULL as the value.\nHere is the schema: {schema}.\nHere is the text: {page_text}. Go!" structure_response = client.chat.completions.create( model="gpt-3.5-turbo-1106", response_format={"type": "json_object"}, messages=[ {"role": "system","content": "You are an expert on Swedish biographies.", }, {"role": "user", "content": f"{structure_prompt}"}, ], ) structured_biography_info = json.loads( structure_response.choices[0].message.content )return structured_biography_infoexceptExceptionas e:print(f"Error in structure_biography_info: {e}")returnNone
Add coordinates to firms and individuals
Birthplaces of individuals who studied at Royal Tehcnical Institute (KTH) in Who is Who in Industry and Business
Conductors of Progress: Upper Tail Human Capital in Electrifying Sweden Presentation at CAS - Senter for grunnforskning Jonathan Jayes - Lund University 2024-04-23