PromptCloud is a powerful cloud-computing DaaS (Data as a Service) engine involved in ‘Big’ data acquisition. PromptCloud crawls data that’s spread all across the web and converts it into meaningful insights. It was founded by Prashant Kumar. Before starting PromptCloud in late 2009, Prashant was at Yahoo! with their data team working on Yahoo! Frontpage which was one of its hottest products back then. He was mostly involved in data crunching using big data technologies that were still evolving. Prashant graduated with a B.Tech-M.Tech dual degree in CS from IIT Kanpur in 2007. He was later joined by Arpan Jha in 2012, who is a Carnegie Mellon alumnus and took over the Products & Market Strategy function. Prior to joining PromptCloud, Arpan has worked as a Consultant with KPMG & Deloitte.
Let’s consider a scenario: say pn.ispirt.in decides to launch a section on the website where they rank all “Made in India” products based on popularity, usage, quality, and some other criteria. One approach is for them to go out and subscribe to the news feed of all important news sites all over the world and try to track all the news and events about all ‘Made in India’ products. This data can then be used to rank them. Given that data about popularity, usage and quality can be generated all over the web (a product review here, a customer complaint there, a Facebook mention, a tweet, a youtube video gone viral, a buyer praising the product on his blog, you get the idea), such a list of websites will be incomplete at best, and the volume of data will be too much to handle for the ProductNation editors.
Enter PromptCloud. PromptCloud offers its Data-as-a-Service for clients like ProductNation who need large volume of data from all over the web for further analysis (this is just one of the use cases, PromptCloud offer many more services). Continuing with the same example, ProductNation and PromptCloud work through following steps:
- ProductNation provides 2 pieces of information to PromptCloud: a list of websites they are interested in, and a list of keywords they are interested in
- They will also mention how frequent they want the data to be crawled which is dependent on ProductNation’s estimate of how fast their data is likely to change. If they need fresh data (say every few minutes), they purchase PromptCloud’s ‘Low-latency Crawl’ service
- PromptCloud will crawl all the data, matching keywords to find relevant content, and then convert it into structured data (XML, CSV, XLS, etc.) for ProductNation’s consumption
- ProductNation can do 2 things with the data
- It can fetch all the data through API calls and download them into its own servers for further processing. This will be done at a regular schedule, agreed with PromptCloud
- ProductNation may not want (or may not have capability) to host all this data. So they buy PromptCloud’s Hosted Indexing Service and they can now let their editors search this index and only fetch relevant content.
- When ProductNation gets the data, they are also provided a relevance score for each data item (as judged by PromptCloud’s algorithm) so that they can optimize their analysis efforts and keep their results very relevant.
If Internet was small, say 1000 sites, this would be a trivial problem to solve – just get all the data and be done with it. Scale of Internet (and the rate at which data is growing) makes this a complex problem to solve. This is a technology problem which needs to solve 4 critical issues:
- Velocity: How fast and how quickly can data be fetched?
- Structure: How can the data be structured meaningfully when data on the web is largely unstructured?
- Volume: How much data can be stored and processed efficiently?
- Relevancy: How relevant the data is to the keywords supplied, and to the overall intent of this data crawl?
PromptCloud is a technology company which aims to address all these issues and offer services to businesses who need to analyze web data at scale.
The PromptCloud Service
PromptCloud offers services built on top of their cloud-computing DaaS (Data as a Service) engine. They offer custom crawl services to their clients. Specifically, following offerings are available:
Their three primary offerings are:
- Site-specific crawl and extraction: Given a set of sites and fields to be extracted, their crawlers will fetch relevant data from the web, which then gets converted into structured data and delivered to the clients via API
- Low-latency Crawls: These are highly optimized crawls which can fetch data in intervals as low as 5-10 minutes
- Hosted Indexing: Structured data created from custom crawls is hosted and indexed and exposed to clients via query APIs.
They offer following features as part of their services:
- Deep data crawls- all past data on the site
- Structured data feeds are available to the clients daily/weekly/n times a day
- Ability to supply only incremental data
- Crawling data from AJAX/non-AJAX based sites
- Indexing of data as per requirements
- Custom Analytics
Their technology stack uses a lot of open source solutions right from Linux, Hadoop and NoSQL to various cloud and cluster management tools. These are augmented with custom components they have written to solve their unique challenges and serve their customer needs better. They serve data to their clients via API which can later be synced to their FTP, AWS S3, Google Drive or DropBox accounts.
Offering web-scale crawling services is a hot space and there are many competitors with similar services. When looking at their differentiators, 3 things stand out:
- Vertical-Agnostic: Their offerings are based on URLs and the keywords they use to filter the results of their crawl, so they are independent of verticals, and can cater to a large number of verticals. This also helps them quick turnaround on new features which then become available to all their clients.
- End-to-end Monitoring – Web sites regularly have dynamic content on their pages, and things can change pretty quickly. While most other providers offer a do-it-yourself solution (essentially making you solve this problem), PromptCloud monitors structure changes on the web and supports clients until data gets imported into their systems.
- Large-scale complex crawls – Managing large-scale crawls is one of PromptCloud’s USPs. AJAX elements on the web sites make the pages unique and dynamic. PromptCloud’s platform can crawl pages that use AJAX and interactions very well.
Being a technology-centric company, CTO or Product guys on client side are the decision-makers and buyers for their product. Their adoption has been good so far, catering to clients in US, UK, Canada, Western Europe, Singapore, Hong Kong etc. Being a vertical agnostic solution, they have clients from all domains be it e-commerce, travel, market research or classifieds and across the globe. They are an early growth stage company and are growing at the rate of 4X in revenues each quarter, with healthy pipeline of clients.
Since they offer custom services, their pricing varies a lot – it could be anywhere from $200 to $10K a month for a given customer. Pricing depends on what types of services are being consumed, as well as on crawl frequency, data volume, value added services, etc. Users can control the price by setting limits to data that they fetch in a month. They also can do some sampling of data to get a sense of pricing run rate, before committing to the crawl.
Currently, most of their marketing and sales happen through referrals. As they go forward, brand-building is going to be key marketing strategy and they are investing in that right now.
They are looking to address a larger market and to expand their offerings across more and more geographies. Scale is the #1 imperative for them right now. The aim is to build a brand around their solution and increase the loyal customer base.
Future releases will focus on following themes:
- Make data richer by applying AI and Machine Learning
- Offer standardized data sets in some verticals
Web Crawling services is a space that is hot and has many players. There is 80Legs (any guesses why they are called so?) which offers a programmable platform for custom data crawling, and there is Grepsr that offers its services to individuals, and there are a lot of them in between – Fetch, Mozenda, Spinn3r (blog, news and social media crawling), and of course an open source web crawler (Apache Nutch).
These products vary along 2 dimensions (and hence they should be visualized in a 2×2 box)
- Horizontal (Platform) or Vertical (Business Solutions)
- Level of programming required to achieve business value
#1 is obvious, let’s talk about #2. Level of programming required to get value depends on the interface that is exposed by these services and who does it appeal to the most. Most of the consumers of data are business people; however, most of these offerings are technical enough that business teams need to work through their technical teams to get value (one reason why PromptCloud sells to Product guys rather than business guys). It is hard (though possible) to have a platform offering and still provide an interface consumable by business teams (because business value will be generated only when platform outcome is processed using vertical business rules which is hard to do without some amount of programming).
PromptCloud is a horizontal (platform) offering that requires a little programming to get it integrated with business flows of the client. For them, this positioning makes sense for 2 reasons:
- Revenue Spread: Horizontal increases addressable market because all verticals can be targeted. However, this also means that value provided per client is less and hence revenue per client is going to be less while number of clients might be large. At this stage of their company, this is a better revenue mix (since it exposes them to a large number of clients).
- Cost of Innovation: Vertical requires more business focus and hence innovations that are specific to a vertical may not be applicable to another vertical, while horizontal means every innovation benefits every customer. This makes innovating for every client a costly affair when focusing on a vertical.
However, it is important for them to make sure they are moving continuously along the spectrum of offering vertical solutions (without compromising on their innovation abilities) and offering business-consumable interfaces.
The Road Ahead
The road ahead for PromptCloud is tough but inspiring. They are in a space that will require much more services in future as data continues to proliferate, data-driven insights become the order of the day, and web data continues to become more unstructured. They have a good set of offering and a good list of clients to work with. However, they do face some challenges:
- They need to gain more visibility in existing and newer geographies; building their brand is going to be key.
- They need to add more products to their bouquet of offerings
- To maintain their technology edge, they need to continue to build the team even through the shortage of trained professionals in this area.
They also need to figure out where they want to put themselves on Horizontal-Vertical axis, we feel that they need to move towards offering vertical-focused solutions, in addition to maintaining a horizontal data platform. PromptCloud (and most of its competitors) offers a technology product to business teams to do their data analysis well (and hence business teams need to involve their technology teams to consume PromptCloud services). We feel that a way forward for PromptCloud will be to become a business product that the business people can consume directly and come to build critical business on. They platform approach (vertical-agnosticity) is a good foundation on which such a business product can be built.
They have the right trajectory of growth, and good momentum and team to continue to push and become a name to reckon with in this space.