NLP and FinTech - Introducing a free text search API


#1

My company just recently opened up a free text search API service with coverage of SEC filings, News,
and Federal Reserve speech transcripts, etc. We plan on adding more sources down the road.

Here’s a snippet we’re using to describe/pitch it:


Iris API (http://iris.lore.ai)

Good information drives great investments, but it’s hard to find everything you need in one place. Now you can use a single API to directly search and access data from key sources, including SEC filings, Federal Reserve speeches, press releases, and over 70,000 news feeds. Save time by setting up your own alerts and screens or mine the raw data for signals. With Iris, you’ll find the facts you need quickly and efficiently.


We offer other products/services to clients built on top of this and other data (which is our core business) but we figured by opening this up we’d learn a lot about how the community would use it. We would eventually monetize by either offering rate-free access, premium API features based on more advanced text analytics, or premium document sources from more publishers.

I’d love to hear FinTech Genome’s feedback on this kind of service. Is it useful? What other public documents might be good to have up here e.g. clinical trial results, industry reports, federal reserve “beige book” etc.? The intersection of NLP and FinTech is super exciting and it’s early days but I’m very interested in hearing what people are currently doing and how they think things will evolve in this space.


#2

@helshowk This is an awesome share. Thanks. We’ve started using IBM’s Watson so this is very timely. I’ve passed on to my CTO who will want to have a play I’m sure. We’re focussing primarily on private company data, so possibly the news feeds would most relevant?


#3

@riskopy_mike Thanks and glad to hear it. News would probalby be a great first source for private company data. We’ve been trying to decide what else to add to the mix. Finding good public data sources isn’t too easy but we’re on the lookout and will keep adding as we go along. For private company data one thing that came to mind was looking at consumer complaints databases.


#4

Are Quandl and or Thinknum providing the kind of fundamental data that Lore AI, is offering acess to?

They were mentioned in this conversation Is anybody working on a replacement for Yahoo Finance?

I also would consider checking out 1010Data (I mentioned in my post below because they do offer access to company filings).
@helshowk lets us know, how these may differ from your offering.


#5

@Efi Good questions so let me try and explain the difference the way I see it:

Quandl is all about quantitative data and time series. I’ve used Quandl a lot as I think many other people here probably have. The service is great and they let you use the same API “language” to access a wide range of data sources. So the main difference with what we’re offering is that our data is text and we let users search through the text and associated metadata. This difference is both about how/why the data is used and how it’s organized on the back-end. Iris API saves developers a lot of time because gathering and indexing all of these different text sources in one place is a challenging process (not to mention maintaining it). It’s also a very different looking architecture technically speaking than Quandls. So I’d like to think of Iris API as very similar in spirit to Quandl but complementary in that the type of data is very different. I recognized the need in finance to essentially distill everything down to something that looks like a time series so we’ve added date ranges to the queries and metadata for easy calculation of signals (e.g. how many times does the phrase “record backlog” show up or how often does the word “lawsuit” show up in a companies filings over time etc.).

Thinknum is also a great service and from what I can gather they focus mainly on the alt-data space. So here they collect granular data about e.g. product prices, store locations, maybe traffic per store? I can’t speak too much to this offering because I haven’t used it extensively. I’m not sure how they gather their data (manually, automated, a mix) but it’s interesting that Quandl is now also competing in this space and I wonder how they will compete with e.g. Orbital Insight when it comes to data gathering. Iris API is all about documents and text search.

There are so many reports, letters, filings, etc that are published around the web by both private/public organizations. As the technology for extracting info from that text matures and gets commoditized, people will turn to it more and more for intelligence. This should complement existing methods, not replace them. So with Iris API we’re starting a bit early but the idea is to collect all of those sources in one place.

I don’t know 1010Data that well but I looked at the site and in your post you mentioned that they offer unstructured financial data as well. Do you know what data, how it’s sourced etc? I have a feeling they are building solutions on top of various data sources but not offering the data pipe directly to anyone. Would love to hear more if you have any other details/insights into 1010Data but again I think with Iris API it’s much closer to a text version of Quandl than something like 1010.


#6

Thanks guys all this info is very useful and the APIs mentioned since soon we plan to build our customer knowledge base of transaction relations and data and text sources are combined by the various providers and aggregators.


#7

@helshowk @Efi

I’m not sure Quandl or Thinknum or 1010 Data are the right comparisons for something like this. From my understanding they all focus more on structured data, whether its traditonal or alternative data. Specifically Quandl requires time series data and 1010 has a lot of interesting data sitting in a glorified spreadsheet interface. I would say a better comparison would be Ravenpack, Accern or InfoTrie, which all focus more on unstructured data, specifically trying to tease out sentiment.


#8

Great conversations unfolding. I will reach out and invite 1010, Ravenpack, Accern, and InfoTrie to get their input.
Of course, if anybody has a contact there, please invite them into this conversation.


#9

Very well known cases, not sure if we’re accidentally building a list?


www.kensho.com


#10

Thanks for sharing these @jordanhauer4. I hadn’t heard of Ravenpack, Accern, or InfoTrie. You’re spot on with the comparison. We did add some time series elements to the API b/c some folks like to convert searches into time series plots (sort of like Google Keyword Trends). Generally speaking you can use the text data in whatever way suits your needs. I personally see alerts/monitoring as quite valuable but also assessing topics and themes around a news search could be useful.

What other data sets would be useful to add to the API? Certainly central bank speeches are relevant since they tend to move FX and rate markets quite dramatically but are there other sources you would personally use?

EDIT: We’re thinking of adding quarterly hedge fund manager letters. It would be interesting to read managers commentary on the same stock/trade by just searching for e.g. “oil” or “IBM”.


#11

Do not hesitate to check what we offer at InfoTrie: we can provide up to 15 years of daily or tick by tick sentiment on 50,000+ stocks, assets, commodities, topics, directly via API, or through third party vendors. You can also visit our SaaS platform www.finsents.com. Please email us if any query: contact@infotrie.com