Robert Diab

Testing the waters with Deep Research

March 4, 2025

Last month OpenAI unveiled a new tool for producing lengthy reports with citations to sources on the web. It uses one of the company’s best ‘chain of reasoning’ models to deliver output that far exceeds the quality of what similar tools from Google and Perplexity AI can do – tools that are also called ‘Deep Research,’ as it happens.

But initially, OpenAi’s version was only available to folks with the $200 US a month “pro” subscription. We had to take on faith effusive reviews, like this one from Reddit:

Deep Research has completely changed how I approach research. I canceled my Perplexity Pro plan because this does everything I need. It’s fast, reliable, and actually helps cut through the noise.

For example, if you’re someone like me who constantly has a million thoughts running in the back of your mind—Is this a good research paper? How reliable is this? Is this the best model to use? Is there a better prompting technique? Has anyone else explored this idea?—this tool solves that.

It took a 24-minute reasoning process, gathered 38 sources (mostly from arXiv), and delivered a 25-page research analysis. It’s insane.

Not everyone was so enthused. One commentator noted that it can “miss key details, struggle with recent information and sometimes invents facts.”

The Verge had a piece about using Deep Research to produce a report on the judicial treatment of section 230 of the Communications Decency Act in the last five years – concluding that “it got the facts right but the story wrong.” Although none of the cases it cited were made-up, and its summary was generally accurate, it had one major problem: it ended in 2023. But 2024 was “a rollicking year for Section 230,” with many important developments, as a law scholar quoted in the article pointed out.

When I first read about OpenAI’s new tool, I was keen to find out whether it could look up sources on third-party databases like Canlii or JSTOR. If it could do this, DP would be quite powerful. You could ask it to find a case about X and it would formulate the query, run the search and – most importantly – read through the cases to find the needle in the haystack.

For the moment, however, it doesn’t do this. But OpenAI says that accessing databases like Canlii is on its way.

(Incidentally, I’ve been writing about how much better free or almost free AI has become at doing brief overviews of Canadian law. I’ve got a piece on the way documenting how well OpenAI’s reasoning model GPT o1 produced a summary on the law of unjust enrichment in 500 words, which I knew little about. It came up with a cogent summary, with correct citations to the three leading Supreme Court of Canada decisions on point and the main 3-part test to be applied. Check it out here.)

The thrust of this post, however, is to report on my own test drive with OpenAI’s Deep Research. As a GPT Plus subscriber (30$ Canadian a month), I gained access to DR only last week but couldn’t think of a good query to test it with. With “plus” users getting only 10 queries a month, I’m inclined to be sparing with them.

Here was my ungrammatical query:

What uses are being made in Canada to use artificial intelligence in decision-making by government agencies or entities, and what policy statements or guidelines have been issued by government or agencies with respect to this?

As you may know, before it embarks on a search that can last between about 5 and 30 minutes, it asks you a few questions to sharpen the boundaries. It asked me:

Are you looking for information on AI use by government agencies across all levels (federal, provincial, municipal), or just at the federal level? Also, would you like details on specific areas of application (e.g., public safety, healthcare, immigration, etc.)? Lastly, are you interested in ethical considerations and public consultations regarding AI in government decision-making?

I told it to be as inclusive as possible.

What it produced was thoroughly impressive. It took 7 minutes and came back with an 11,000 word document titled “AI in Government Decision-Making in Canada” that cited 27 sources. You can read it here.

It was both informative and sufficiently engaging that I read it all the way through, and I learned an enormous amount. (I then got it to do a 1,000 word summary, which you can find at the end of the thread.)

The report was impressive both in terms of what it covered and where it pointed. It touched on the use of AI at federal, provincial, and municipal levels across Canada and in various fields: policing, healthcare, immigration, social services, transportation, even the courts.

It also struck a nice balance between useful and reliable government policy docs and reports, and shorter news items.

What struck me reading it was that it would take me easily a week of surfing, reading, note-taking, and compiling to produce something this good. Maybe more.

It would no doubt have been a better report. More selective in some ways, more discerning, maybe more probing.

But this was about 70 to 80% as good as I could do myself – in 7 minutes. It hit all the bases, all the major stories in the government use of AI in recent years: Clearview AI, Chinook, interventions by the Federal Privacy Commissioner, major policy statements on the use of AI.

Quibble with this as you might, but this is not a minor development. The sources are real. The general summary is cogent and more or less accurate as far as it goes. Is it missing that great paper by so-and-so on this or that aspect of the problem? No doubt. Does it contain every relevant story, all the relevant policies, cases, and so on. No it doesn’t.

But is it worth consulting as a starting point? Do I know much more about this topic than I did before I ran the query? Absolutely.

I come away from Deep Research feeling more optimistic about the utility of AI in research, and legal research in particular. I can see a point in time on the horizon when AI will produce a better first draft of an outline of argument or opinion than we could possibly do in less than a few days, even with a good grounding in the field.

It’s even to the point of making me question my assumptions about AI not being a substitute for really knowing the law – that people without a solid foundation in law won’t know how to prompt effectively. Just not sure about this any more. 

- Back to the top | Earlier Posts -