Using a chatbot to "deep research" genealogy

lupaglupa

Lt. Colonel
Forum Host
Joined
Apr 18, 2019
Location
Upstate New York
I came across this article yesterday that left me with less information than I had when I started, which is a nifty trick for a website that says it's there to 'guide' readers. The piece describes how author decided to use ChatGPT, an open source chatbot, to research her family tree. Her stated goal was to do "deep research," which is the term ChatGPT uses for it's search and analysis tool. Supposedly this tool will take your prompts, search the web, analyze the material it gathers, and provide you with a synthesized report that gives you results without doing the work.

The author then walks us through her prompts to the bot, which, no lie, start with "Provide information on the Smith family from Lancaster, Pennsylvania, between 1800 and 1900." This is her specific example after she helpfully points out that "Tell me about the Smith family." is not specific enough. Ya think?

15 minutes later, we are told, the author has her report. But she never tells us anything substantial about her results. Almost as a throwaway she writes "I really enjoyed knowing that my ancestors came over on the Mayflower." Huh? Setting aside the grammar (Does she now no longer enjoy knowing that? Or has she stopped knowing it?), the leap from 'Smith family in Lancaster' to "came over on the Mayflower" is a heck of a jump. Now, honestly, the Mayflower passenger family lines are incredibly well documented so I suppose if you were descended from one of those families it would be easier to find information on your genealogy than if your ancestors were potato famine era Irish immigrants. But anybody with a Smith line in their family knows you have many, many ways to get bad info in your research.

The article includes a number of caveats. I loved this one - "While AI offers powerful tools for genealogical research, it's important to be aware of its limitations, for instance, accuracy concerns. AI-generated content can sometimes include fabricated information. Always cross-reference AI findings with primary sources or established databases." She ends with "Be sure to critically evaluate the AI-generated information and ask more questions. As with all research tools, combining AI capabilities with traditional methods and personal insights will yield the most comprehensive and accurate results."

In other words, use our tool, get a report which may be bogus, and then do the work of checking existing sources and research. Okay then.

I honestly think that tools like this, especially if they can scan the deep web, do have some value in them. An experienced researcher could use a tool like this to look and see if there are sources he or she might have overlooked. That researcher would use the results as a prompt to check new sources. But, as always happens with tools that promise short cuts, folks who aren't experienced are going to put in some simple prompts, get back an unreliable report, and believe it. I can only imagine how much of this junk will end up on Ancestry and Find a Grave and then get copied onto multiple trees.

<sigh>

 
I've used AI in my project to determine birthplaces of German-American soldiers whose entry isn't entirely legible, is misspelled, or otherwise. I can ask the prompt to give me the top five potential birthplaces for a town that, for example, I know begins with S, ends in -gen, and is located in Prussia. This method has been hugely helpful, responsible for probably 10-15 percent of my findings.

The problem with AI like in the OP is when people treat it as the solution rather than a tool to help find the solution.
 
I've used AI in my project to determine birthplaces of German-American soldiers whose entry isn't entirely legible, is misspelled, or otherwise. I can ask the prompt to give me the top five potential birthplaces for a town that, for example, I know begins with S, ends in -gen, and is located in Prussia. This method has been hugely helpful, responsible for probably 10-15 percent of my findings.

The problem with AI like in the OP is when people treat it as the solution rather than a tool to help find the solution.
That seems like the kind of focused search that would work. And I assume you then confirmed the findings.
 
A potentially useful tool, although it's going to struggle with common names. If your ancestors are Smith, Jones, and Williams good luck getting AI to get the right people.

This tool reminds me of the Hints on Ancestry.com: sometimes finds great stuff where I wouldn't expect to look, sometimes gets very false positives.

If the records from every census were put into a single database, AI could be useful at finding the people who seemingly appear only on one census and matching them to other records.
 
I have stumbled across amazing stuff using basic search tools. But I knew it was amazing stuff because I have put in hours building a framework of known and proven facts that I can fit the oddball stuff into. Once I a record return that put my husband's great grandmother in Texas as a lodger. Totally weird as she was a lifelong Michigander. But I checked it out and found the household she was lodging in was that of her sister. I didn't reject it but I didn't just accept it either. And because I had her family tree I could easily identify what was going on.
 
When using AI, ChatGPT is most definitely the worst AI to use. There are two types of AI as used at present: generative AI and analytical AI.
- Generative AI, as the name suggests, is for generating new stuff, such as fictional stories, new ideas, etc. ChatGPT is an example of generative AI. It has been shown that much of what ChatGPT comes up with is either made-up entirely or factually inaccurate all over what it presents.
- Analytical AI, on the other hand, is used for analysis. Many important professions, such as the medical industry, use analytical AI to help solve problems, and would be the kind of AI better suited for our purposes on CWT. I don't know any analytical AIs by name, but I do know that these are far superior to ChatGPT for 100% factual research purposes.
 
I see so many advances being made and so many new products out there. I don't doubt that AI will be revolutionary. But I think it's just too early to know what tools will work well. I'm going to stay on the sidelines until more is known.
 
If you know a lot about a particular ancestor a tool like ChatGPT might be useful for finding sources and information that you may have never seen before. If you are just searching for anything on a person and know nothing about their background, something like this might lead you down a lot of rabbit holes.
 
OK, intrigued, I entered my great-grandfather into an AI chat; he was "blokmaker" in Kristiansand, Norway to the king--that is, he made those pulleys used on ships in the 1800's (they were called bloks). Back came the information that AI had never heard of him (which isn't surprising) along with the recommendation that I get myself a book on Norwegian concrete block making.

Surely this was one of @nc native's rabbit holes: while AI correctly translated "blokmaker" as "maker of blocks," it pointed me in an irrelevant direction which would be a complete waste of time. Although I'm no expert in concrete-making, I question that concrete blocks date back to great-grandfather's day--and common sense tells me that a sailing ship would be taking a risk by using bloks made of concrete so the market would be so small that my great-grandparents might have starved. I'm going to stick with my own research.
 
I tried AI again, this time with a purely historical problem: who was responsible for the murders on Smuttynose in 1873? This is a subject that I have researched thoroughly for both professional and personal reasons.

AI came back with a synthesis of traditional narratives. It also gave me "facts" which I knew were incorrect (simple, provable thing like who was married to who). It missed (or declined to consider) sources both here and in Norway (the victims were all Norwegian). Frustrated, I challenged AI, feeling like a friend who gets into heated arguments with her car's GPS; each time, AI backed down, saying "sorry for the confusion" and accepted my corrections.

This worried me most of all. AI seemed to be accepting my corrections without challenge--was it adding my comments to its "intellegence"? In this case, I know that I was correct BUT what if I had thrown in something about my being governor of Maine or NH at the time? Would AI have accepted that--and passed it on to others?

In a corollary to the simple misinformation that I found on AI, a computer-nerd newsletter to which I subscribe (because I am so incompetent of a computer) warned that the Chinese DeepSeek gives incorrect answers about Asian affairs.

Misinformation-limited sources-naive acceptances. AI may be useful in some fields but I don't think that history and genealogy are among them.
 
...it would be easier to find information on your genealogy than if your ancestors were potato famine era Irish immigrants. But anybody with a Smith line in their family knows you have many, many ways to get bad info in your research.
I suspect that in the case of my Irish potato famine immigrants, it would probably spit back my own research. Most of the "hints" I get in ancestry for them is just regurgitation of my previous work. Although, perhaps an alternate form of the Turing test for machine intelligence would be to "trace my Irish ancestry". If the machine starts smoking and gives up, it may be gaining intelligence.
 
So for laugh's and grins I posed the following question to Grok. It's about my avatar Patrick Sheren (Sheran)

"Find marriage date for Patrick Sheren born 1842 in Ireland and died in 1883 in Philadelphia, PA"

It came back with the below which is wrong for the person I was asking. It did, however, uncover yet another "Patrick Sheeran" that was born about the same time and died the same year in the same city that I was previously unaware of. Interesting this one married a "Kelly", and there is a random "Kelly" in our family plot for some reason, but mine married a McLaughlin. I suspect mine was married in 1864 possibly in Ulster county NY, or Philadelphia.

"To find the marriage date for Patrick Sheren, born 1842 in Ireland and died 1883 in Philadelphia, PA, I'll work with the information provided and available resources. The name "Sheren" might be a variant of "Sheeron," "Sheeran," or "Sheridan," common in Irish records, so I'll consider spelling flexibility due to historical inconsistencies.
Given Patrick's birth in 1842 and death in 1883, he likely married between the early 1860s and early 1880s, possibly in Ireland before emigrating or in Philadelphia after arriving. Irish marriage records from this period are accessible online through sites like irishgenealogy.ie, which hosts civil records starting in 1864, and earlier church records in some cases. Philadelphia records, including marriage licenses from the late 19th century, are held by the Philadelphia City Archives and partially digitized on platforms like Ancestry.com.
Searching for Patrick Sheren (or variants) in Philadelphia, one potential clue emerges from historical data: a Patrick Sheridan, born around 1842 in Ireland, appears in Philadelphia records. This Patrick, noted in Ancestry trees, married a woman named Mary (surname unclear, possibly Corbley or similar) around 1867–1870, based on the birth of their first child circa 1868. This aligns with a typical marriage age of 25–28 for men in that era. However, his death is listed as 1917, not 1883, suggesting this might be a different Patrick unless the death year you provided is off.
For a Patrick dying in 1883, he'd be about 41, so a marriage in the late 1860s to mid-1870s fits. Philadelphia's marriage records from 1860–1885 show no exact "Patrick Sheren" dying in 1883, but a "Patrick Sheeran" died in Philadelphia on March 15, 1883, aged 40 (born ~1843), per the Philadelphia Death Certificates Index. This matches your timeline closely. Marriage records tied to this Patrick are less clear—city ledgers from that period list marriages alphabetically but not always by exact date. A Patrick Sheeran married Mary Ann Kelly on November 25, 1869, at St. Patrick's Church in Philadelphia, per church records on Findmypast. This couple had children in the 1870s, and the timing aligns with an 1883 death.
Without access to a definitive record linking this Patrick's marriage to his death, I can't confirm absolutely. Irish records pre-emigration (pre-1864 church or post-1864 civil) don't yield a clear Patrick Sheren born 1842 marrying there, suggesting he wed in the U.S. after arriving. Immigration typically occurred in the 1860s for Irish men of his age, post-Famine, supporting a Philadelphia marriage.

So, the most likely marriage date, based on this analysis, is November 25, 1869, to Mary Ann Kelly in Philadelphia, assuming this Patrick Sheeran is your Patrick Sheren. To confirm, you'd need his death certificate (listing spouse or marriage details) from Philadelphia's 1883 records or church sacramental registers. Can you provide more—like a spouse's name or child's birth—to narrow it further? Otherwise, this is the best fit from available data."
 
I imagine it will generally uncover something. The question is, is that something worth having?
Interestingly enough, I gave it more information as it asked for, it ultimately said, things to the effect of, "I find records for those names in all sorts of places, but nothing that ties it to your information. Then it stopped taking my questions.
 
One thing that bugs me about the results people are talking about here is that the bot is giving you a synthesized result, not actual documents. If the bot came back and said "Here are documents you might want to check yourself" it might be useful to me. But I've seen too many transcription errors to believe anything but the originals.

I am actually working in Ancestry right now and had an experience that demonstrated this. Ancestry grabs info from Newspapers.com and allows you to add it to your tree. If you don't pay for the newspapers access you can't see the original, just what their AI has grabbed. In this case a cousin of mine died in Albuquerque, New Mexico. Her obit tells where the interment will be, giving directions to the readers and indicating that the cemetery is on 929 Menaul NE. Ancestry's suggestion is to add the fact that my cousin is buried in Nebraska! Obviously, that NE stands for northeast. But if you couldn't see the article you might well think Ancestry was right and check the box next to Nebraska and add it to your tree. I bet if I looked at every tree which includes this cousin, at least a few will have that 'fact' included.
 
This is what really worries me. We've talked multiple times about how frustrating it is when a bad fact gets entered in a family tree and is then duplicated multiple times, which seems to increase people's assurance that the fact is correct.
What's worse is that this is happening with your medical records. It is next to impossible to get the errors corrected. They won't be removed. Usually, the best you can hope for is to have a note added stating that you think it's wrong and why. Like anybody pays attention to what patients think!
 
GROK says my 110th PA grandfather was a Physician, which I'm sure would've been welcome news to him, as that was a niftier uniform than what he'd have worn.

Then I went over to ChatGPT, which informs us he was a "notable figure" in the war. Again, no. But it does at least have his rank as Hospital Steward correct.

And, last: under a simple general query of his name & Reg't, ChatGPT displays enough good sense to link both my website and CivilWarTalk as the sole two sources of info. for him. Tells us a photo of him exists and "has been shared by his descendants" at jacksonsvalleycampaign. Nah, just.... me, who, I'm told, had "animated" "the image" "to provide a visual connection to his service." Nah brah, I just wanted to see if I could get him to smile.

Oh, also: it wouldn't give me beyond bare bones on the 110th actual Physician (whom the JCCW railroaded out of the war until Lincoln let him back in). Plenty of testimony in the O.R., but GPT (if I may, as we're now on intimate terms 😐) claims it doesn't know that. I wonder how AI accesses the O.R., if much at all. Right now, it seems GPT has best use for those already familiar with what they're looking up. But then, isn't that how much of the Internet works (& life)? It helps to be able to spot lies & omissions from the get go.

Screenshot_20250502-111818~2.png


Screenshot_20250502-111031.png


Screenshot_20250502-131457.png
 
Last edited:

Learn About Us
About CivilWarTalk
Contact the Webmaster
Meet the Staff
Link to CivilWarTalk
Join Our Community
Register
Browse Forums
View Today's Discussions
Search the Forum
Get Help
FAQ
Student Guide
Forum Rules & Etiquette
Copyright / DMCA

     Contact Us CivilwarTalk on Facebook CivilWarTalk on YouTube CivilWarTalk on Twitter RSS Feed

Bringing the American Civil War and More to Life.
© 1999 - , CIVILWARTALK, LLC - Site Version 10.0

SlaveryTalk.com - SecessionTalk.com - CivilWarTalk.com - ReconstructionTalk.com
Back
Top