313: Challenges of Offering an API

Download MP3
Arvid:

Welcome to the Bootstrap founder. Last week, I dove into my ideal customer and the exploration of finding them. Now that I've chosen that elusive ideal customer profile, that ICP, there are consequences. And I have found that the right people to build for are there now, but what do they actually need? I'll walk you through my product challenges today.

Arvid:

This episode is sponsored by acquire.com, more on that later. Let's recap from last week. I decided to turn PodScan into the most comprehensive podcast data platform it can be. So you focused on transcription, but there's more to this and I'll get to that in the in today's episode. My ideal customer and the customer profile around them is anyone who wants to build a product or service or business on top of such a data platform in the podcasting space or utilizing podcasts for whatever they wanna do.

Arvid:

Might be marketing, might be placing people on podcasts, might be anything. As long as they need the data that is my customer profile. And that means that I'm selling something that is extremely easy to copy and to clone and to abuse. Right. That's what data is.

Arvid:

The thing that I want to give freely to my paying customers, transcripts, rankings, metadata, all other kinds of things related to podcasts is also the thing that I have to protect at all costs. That's the bizarre thing about businesses that are based on APIs. The easier it is to grab the data from the business or from the API, the more people wanna actually use it because it's simple. It's enjoyable. Yet, the easier it is to grab a lot of data, the more risky it gets for the actual business offering it because people can quickly abuse it.

Arvid:

There are a few problematic kinds of behaviors that software businesses have to contend with, if they are in the API space, in the data platform space. And they are exacerbated, particularly if you have all your value locked up in your API. And that is mostly scraping. It is the biggest threat to a business. Just someone grabbing the whole database in one go.

Arvid:

Every single podcast, every single transcript, all connections between them, or ratings, the whole thing. If somebody were to get this, that would be a problem. And the problem is beyond that, duplicating a valuable treasure trove of data. That's pretty much what the internet was built for. Right?

Arvid:

Every time you go to a website, a small copy of that website is made on our computers. And most of the time website owners actually want that. Right? That's how it works. That's how we see content.

Arvid:

And don't get me started on actually downloading files. Right? Like we have BitTorrent and we have, like, the file system, distributed file systems and all that kind of stuff. IPFS, where people very very willingly host copies of things on their computer so that other people have access to them. But a fully fledged database that costs 100 of hours in work and tens of 1,000 of dollars to create or has at least until now.

Arvid:

Yeah. I don't want that to be copied, really not. So I need to prevent it. And from the start, I have been and I need to keep staying ahead of those who want to siphon this treasure trove of data into their own systems. And with that in mind, I think I need to think defensively in a couple of ways about my product.

Arvid:

1st, I need to make it hard to iterate over my database entries very easily. I need to make that very complicated. And if you're the example here is I guess if you're downloading record 4,287 of my database, you know that there's probably a 4,288 and the 4,289 and so on. Right? And that way, anybody who's intend to scrape the website could be just building an automated system to grab every single record in a row, iterating over these numbers.

Arvid:

And that's why I created encoded IDs in my API, just like Stripe is doing. I think that that both obfuscate the underlying ID to make it harder to iterate and make the record more recognizable. So that 4,287, which is just a seemingly random number, turns into something like podpod_ and then some kind of string 88 65 whatever b. There's still randomness in there, but it starts with pod and it has a certain length. Right?

Arvid:

It's not just a number. And that looks more like a podcast because it has the word in it and less like a random number. And I think that is also more usable. And usable in a sense that it can be copied more easily, it can be shared, it's something more semantically holistic. And if somebody were to get their hands on a list of all of these IDs, well, obviously, they could still scrape them.

Arvid:

But all this really needs to do is to deter people from seeing an easy opportunity. And a lot of security in the API space is about hard security, but there is also that kind of stuff. Right? It's not security by obscurity because there is an encoding and a secret and a hashing process involved for this. But it also is just a signal that, okay, this is not gonna be one of those easy APIs to scrape.

Arvid:

And in many ways that can deter at least the less interested parties. You have still have to fight the other ones, but we'll see. Let's let's keep thinking about what else I need to think about. Any API that I offer needs to be severely rate limited, particularly in my space. Because podcast information, historical data that goes back to, like, 2012 or whatever when people started really podcasting.

Arvid:

Well, that doesn't change after the fact. Once it's scraped, it is true. Right? Once somebody's downloaded this, they have something valuable and it's not changing. It's not like altering its its nature all the time.

Arvid:

That is something that they can then build something upon. And even with mild scraping, somebody could eventually explore the whole API within a few months, just using search or whatever. So if if they were to use it all the time, right? And that's where rate limits come in. Rate limiting means really on my trial plan, for example, that only a 100 requests per day can be made to the API.

Arvid:

It's not a lot. For a scraper, that's actually used up within a couple seconds and then can't do anything anymore. But for somebody evaluating the API product, playing with it, checking it out, a 100 manual requests, that's quite a lot. And it's more than enough to see if it works, how it works, what the data is, what the speed is, that kind of stuff. And paid plans on PodScan get very liberal, but still very sensible limits, like a couple thousand, a couple maybe tens of thousands for for the bigger plans.

Arvid:

And if somebody needs more, they can buy an enterprise plan and get in touch. Like I have the capacity to set these limits on a per account kind of level, right? That's, that's kind of the idea of, of running this API. I can control how much people get to see, how often they get to use it and all that. And for anyone else, I think these limits are quite sufficient.

Arvid:

And if they're not, they can just ask me and I can modify them as I learn more. Right? They can tell me, hey, this is I think I'm paying $50 a month, but I think I need more. Well, maybe I can adjust the limit. I'm very flexible.

Arvid:

This is my business. I could do whatever I want. So, you know, that this is a part of communication happening here. But I had to set limits initially to protect the API from being script or from people being able to do this in the first place. And finally, I guess another choice that has to do with money in this case, I guess, is there's no freemium.

Arvid:

There's no freemium plan. I cannot and I will not allow non paying customers to access this data. Because if they cannot afford the $19 a month plan, the essentials that still has significant access to the API, they can have it. They just won't. People go through great lengths of automating account creation and data extraction in freemium products.

Arvid:

There are a lot of examples where people with a freemium plan have even just free trials to begin with, but freemium in in particular that has certain limits set and then people create like 20 accounts in one go. Like all, like, their name plus 1 and plus 2 and up to plus 20 and then some just to be able to abuse the system. And I don't want that. That's not gonna happen here. PodScan is pay to play.

Arvid:

Like, for real, it's a business. Come on. Like, if you want this kind of data to build something with, you can shell out $20 for the most basic version. And if you can't, well, maybe you shouldn't build a business. And I do all of this mostly because, the most easy part of PodScan, if you were a copycat trying to clone it, is the interface and, you know, the the kind of web facing stuff.

Arvid:

The complicated and expensive stuff is all in the back end in the database. That's what I have to protect because that's what people are after and that is hard to build. And product limitations, like what I've been talking about, aren't the only barriers that I can throw into people's path here. Of course, as a German, I like rules and regulations, and I draft the terms and conditions for the API even before I had built it. Maybe not the best idea for an indie hacker, but it just comes very natural to me.

Arvid:

So it happened. I had it in place before I even activated the first user on the API. The first sentence of these terms should make it absolutely clear what's okay and what is not. That sentence goes like, you cannot use the PodScan API to create an application or service that competes directly with PodScan's core products. That's the sentence.

Arvid:

Right? You cannot build PodScan clone clone scan dot com from the PodScan API. Because if you do, I will turn off the API and I might take other steps because that is a breach of these agreements. And I added a few sentences in there about storing the data, also not allowed, if it's not meant for immediately serving your customers. If if that is what it's for, that's fine.

Arvid:

But like storing it and using it for your own purposes, no. That is a limitation that every API user agrees with upon connecting to the PodScan APIs. That's also part of this. Right? Like, the moment you start using it, you agree to these terms.

Arvid:

It's very clearly outlined and it's there's an intentional act to connect to an API. So I assume consent. And, of course, all of this is flexible. Right? Like, for sure.

Arvid:

But the the point is people cannot use it to build a a clone. That is legally not okay. Right? And they cannot use my data without rerequesting it from the servers after a certain while, which also helps with, like, stale data floating around, and it kind of protects the integrity of the data that I present. And when you limit access like this, you also limit opportunity.

Arvid:

I'm quite aware of this. And that's the kind of hard balance to strike here. I want my users to feel like they can build anything they want on top of these APIs, But I also very much still wanna be in control of the data that powers these other products. It's funny because I've been looking into my competitors like the Podchasers and the Listen notes or the data platforms for podcasts. And they have very, very similar terms and just rules.

Arvid:

And I've talked to several of those founders and they are super highly protective of that data because they know how valuable it is and they know how easy it is for people to copy large amounts of data and then do something with it and sell it for cheap. So they, they are usually charging a lot of money even for the most basic access. And they are highly, highly restrictive in terms of how much data you can get at any given point and any kind of amount of data that you can grab. It's really, really noticeable, like just how protective people are of podcast data. I find it very interesting because it's so funny to think that podcasting itself is a highly open ecosystem, but the aggregation of this, the aggregation of not just podcast information itself, but metadata, like transcripts, what I'm doing or like audience information, what James Potter is doing, what Wefonic, that kind of stuff.

Arvid:

Like there is so much additional data being aggregated and it's so expensive to do it that people protect that data very, very strongly. And I got a message earlier this week on my help desk chat, the the widget that I have, the little bubble, from a founder who wondered just how much they could cache the data. Not even storage, just cache it that they receive from my API. Is a few seconds fine? Can they go, like, into their own cache to be sent out in an email later?

Arvid:

And it got quite specific. And it just reminded me how much just in time decision making running a software business really is about. Both on their end because they needed to figure out is this a tool that we can use for our purposes, but also on my end because I needed to kind of think about well how much do I allow people to do here. Right? Is it fine to cache it for an hour?

Arvid:

Is 24 hours okay? Can they write it into like a Redis or can they write it into like a, an SQL database somewhere, but then delete it? I get It was an interesting consideration, like how much am I willing to bend my own rules to facilitate things by people that I have kind of validated and can trust. That is, something I guess that I will even find more often of this particular challenge. The more I go into kind of deals with bigger and bigger businesses.

Arvid:

But right now, it already with these founder kind of people that I talk to that are on my level that build businesses, build small businesses to try things, it's already happening here. I found that a very interesting observation. I I need to remind myself that all of this is just stuff we make up. And that these rules are there to be flexible and sometimes broken for an opportunity that comes our way. But they're also there to protect us from other things that other people see as opportunity.

Arvid:

Right? That's kind of the theme of this whole thought for me. I guess how can I build a defensive business that this is that can also be offensive? That can just go for an opportunity without risking too much. And that's kinda where I'm at.

Arvid:

And the more data I ingest with PodScan, like the more data is gonna pulled into the system and transcribed and analyzed, the more critical these choices and partnership agreements will become. Right? Right now, my users have personal access to me. They can write to me. They can d m me.

Arvid:

And I often have a personal history with them from prior Twitter conversations or email exchanges. But someday, these will be bigger and bigger businesses trying to get their hands on as much as they can, because there is no personal connection. There is no mutual trust. So that kind of brings me to another conundrum that I have here. And there are some kinds of data that I collect from a wide variety of sources that I might not want to share on the API at all.

Arvid:

I have them, but I don't wanna give them out. I'm thinking about this and let me explore my thoughts with you here. Audience size, I mentioned this earlier, is one of the best kept secrets in the podcasting world. Right? That it's it's really hard to come by data.

Arvid:

No hosting provider, no podcast player or the creator of such software gives away even a glimpse at the actual numbers behind the podcast that they work with or on. The only people who know how many listeners they have are the owners of the podcast themselves. Sometimes they don't even know because their podcast is run by a network and the network knows, but they don't. They only get like like rough numbers. And even those they don't share, it's it's a really really tough thing to come by.

Arvid:

In such a situation, what would one do? Well, I guess guesstimates are the thing I can present. There are a lot of tools out there that check the Apple podcast charts and look for review counts and then check the size of the social media profiles that are mentioned in the podcast descriptions, and then compile them into some kind of score. Podchaser has a score. Listen notes has a score.

Arvid:

And I think I'm working on something similar as well, because I'm effectively building the same, kind of system in the background. But I could share all these metrics that kind of go into the score in my API. Right? I could share very specifically how many reviews this podcast has in the United States. Or at least it had over the last couple of weeks.

Arvid:

Because I have a full history of review counts on Apple. That's what I've been building. So why not add it to the API? I'm thinking about this and I'm struggling with this quite a bit. I want my users to be able to get as much as they can from the platform.

Arvid:

Obviously, this data can be useful to someone. I don't know what it's gonna be. Like maybe somebody wants to build a review system for themselves and they, you know, I don't know what the data can do. I just know that it's there and people could use it. But I also wanna keep some secret sauce to myself, because if I create a score from the from that data, well, the score is valuable all in itself, and it doesn't need the data.

Arvid:

And I've been looking at how other platforms solve this. And most of them just really don't. They don't give you the data. If anything at all, they share this rough score, a simple ranking like 4 out of 10, like, you know, a star rating or something or top 10%. That's all you get.

Arvid:

And even that tends to be only available in the more expensive tiers. Like you have to pay for it. And only then even do you get these weird little ranked scores. And honestly, it feels odd to limit data like this, but I think that's what I will do with PodScan as well. It's just, like, realistically, audience information is probably the most expensive non AI work that PodScan could do.

Arvid:

Right? AI work is like GPU, like transcription and inference and that stuff. That is that is expensive because that runs on a hardware that is just really really rare right now. It's hard to access and people charge a premium for it. But even computationally on a CPU or the many, the 48 CPUs that my measly PHP server has, there's a lot of computation.

Arvid:

Because when you scan for audience information, scan for social media profiles and that stuff, it involves constantly scanning the web, parsing websites. Occasionally, I need proxies to reliably get results and all of that has a cost. For this reason, I think I'll make anything indicating reach and audience and listener data a premium and higher plan feature for PodScan. Right now PodScan essentials is the starting plan for $20. Premium is 50 and higher is like a 100 and plus.

Arvid:

Right? That's like my my enterprise plan is at 500. Hey. If if somebody wants it for a little bit cheaper and has a good reason, then they reach out to me, they're probably gonna get it. I'm at this point in my business.

Arvid:

But, you know, that that is kinda where it is. 20, 50, 500. So at 50 plus, this data will exist on the API. And the API will not return these fields for essential customers and only return example data or rounded numbers for trial accounts. I think I do not want abuse for this because that is so relevant.

Arvid:

The data is so hard that I I need to protect it. Hard to get and hard in itself very true. So, I'll have to figure out how I can actually communicate this. Like, for people who are on those lower plans that there is more there, but they can't have it unless they actually pay for it. I probably have to put it in the documentation and maybe inside the product as well.

Arvid:

But I think that's the way forward for this. It costs me to create, so it should also cost people to consume. And, of course, I'll have to make sure that all these limitations and protections are also present in the user facing website. Like, obviously, scraping happens right at that level, not always on the API, but, like, right on the website. And I can already feel that my eagerness to present all kinds of interesting data there might lead to a kind of data extraction that is not easily fought with rate limits and a IP blocks on the API side.

Arvid:

I really need to make sure that my website is protected too. It is, like there are rate limits in there as well, but I need to make sure that, you know, that's, it's actually working and people can circumvent it easily. No doubt, I'll run into other API and data related issues in the future here as well. And you might even think of one right now that I missed that is very clear to you and I have not thought about at all. So in that case, please just reach out to me.

Arvid:

Tell me this is an important topic, and I would really, really like input here. Feel free to send me a Twitter DM or email me at arvid@podscan.fm. Like, I am very responsive either way. Twitter DM maybe not as much because Twitter's interface is horrible, but, you know, send me an email. And I really appreciate all the wonderful feedback that I've been getting over the last couple of weeks as I've shared the PodScan journey in public.

Arvid:

It's been a lot of fun. Got a lot of really cool, shout outs on Twitter for it and in the community. People have been responding way more to my emails and my newsletters than before. So I mean, obviously building in public is interesting. I follow several podcasts and people who are building their own businesses in public religiously on my dog walks I listen to that kind of stuff.

Arvid:

So I get it and I quite enjoy it and I will never stop. So, you know, just enjoy it, and send me what you think. That's that's really what this is. I've been having so many cool conversations with my users and my customers. Both of them highlighting things that they need, highlighting things like that they are interested in, how they approach their work, what they want, what they would like to see, what they use somewhere else.

Arvid:

It's been really, really enlightening. Any feedback is welcome. And maybe this as also allows you as you are building your own thing to allow for more feedback from others. Right? I know we have our ideas.

Arvid:

We have our plans and we have our goals. And we have this conception as solopreneurs and indie founders that we are the only arbiters of choice. We make the decisions. So feedback is often something we struggle with because it can throw something in the path that we are not willing to deal with, but you can and should get some feedback. And if you wanna try what it feels like from the other side, send me a message.

Arvid:

Tell me what you think. I really appreciate it. And that's it for today. I wanna briefly thank my sponsor Acquire.com, whom I intend to use to eventually sell PodScan for many, many 1,000,000 of dollars, but we'll see. Because that is not always the outcome.

Arvid:

Right? In some situation, you might have a really cool software business. You have customers. You have thousands of customers. You have good MRR.

Arvid:

You're living the dream, but, you know, it's it's you're stuck in some kind of equilibrium that you can't really go anywhere. It's working, but it's not growing. Right? It's working well, but it's just not going anywhere. Or maybe it's even just going slowly, slowly down, and you don't wanna lose the value of the thing that you have created.

Arvid:

And I know the situation is different for every single founder out there because we've all been building very different businesses. I always say that entrepreneurship is some is building something that nobody else has ever built before in this exact way. So obviously the outcome is always gonna be slightly different and unique between all of us. But if we are in a situation like this, where we hit a skill ceiling, or a time ceiling, or just an attention ceiling, whatever it might be, the outcome is often the same. Like we lose interest, the business suffers, and in the end it becomes less and less valuable, maybe at worst completely worthless.

Arvid:

And that does need to happen. You create value, you should be able to convert it into cash for real. Right? You should be able to sell it to somebody who does not have the ceiling, or has a different ceiling that can be useful for your business right now. So in that case, think about selling your business.

Arvid:

You don't have to do it today or this week, but you should always consider that this is an option for you to take your business in the future. You can go to try. Acquire.com/arvid to just see for yourself if this is something for you. Because the people who had acquired have been helping 100 at this point, I guess, thousands of customers. Your people, founders like us, sell their business for a solid price that other people are willing to pay, and that will change your life.

Arvid:

So they know how to make a business more sellable and more presentable. That is something you should look into from the beginning anyway. So yeah. Go to acquire.com. Check it out.

Arvid:

Free to list. And I think this might be the right option for you at some point. So why not check it out today? Thank you so much for listening to the Bootstrap Founder today. You can find me on Twitter at avitcalves.

Arvid:

Send me a message there if you're interested. You find my books 0 to Sold the Embedded Entrepreneur. My Twitter course find you following that too. All this is still very valid and useful. I occasionally look into my own books to just figure out where my next steps should be.

Arvid:

And if I'm not forgetting anything, it's quite useful, really good for my journey and good for your journey as well. If you wanna support me in this show, please subscribe to my YouTube channel, get the podcast in your podcast player of choice and leave a rating and a review by going to ratethispodcast.com/founder. I mentioned James Potter earlier who built, Refonic, like a podcast audience size kind of API platform. He also built ratethispodcast.com. So I've been kind of suggesting his product to any listener of the show for years at this point and never knew.

Arvid:

We've been chatting, right? Like all the people building businesses in the podcasting space are kind of connected with each other. So that was really fun to have a chat and find out that we've been, you know, Internet nerd friends for a while and just didn't know. Well, any any rating, any review will really make a massive difference if you show up on those rating platforms, because then the podcast will show up in more people's feeds. They will learn more about my journey, the products, and they will also just be more exposed to the knowledge that I'm trying to share.

Arvid:

And I think that is really important to me and I appreciate if you help me with this. Thank you so much for listening. Have a wonderful day and bye bye.

Creators and Guests

Arvid Kahl
Host
Arvid Kahl
Empowering founders with kindness. Building in Public. Sold my SaaS FeedbackPanda for life-changing $ in 2019, now sharing my journey & what I learned.
313: Challenges of Offering an API
Broadcast by