I ran a Discord bot on Replit for months. It worked fine until it didn't. Cold starts, random disconnects, the bot going offline at 3am when a customer needed it. Replit was pivoting away from hosting and it showed. The writing was on the wall. So I moved the whole thing to Railway, and it turned out to be one of the best infrastructure decisions I've made this year.
This is the full story of migrating a production Discord bot (with a SQLite database, Stripe integration, and live users) from Replit to Railway. What broke, what I learned, and why the cost difference alone would have justified the switch.
Why Replit Was Falling Apart
Replit started as a great platform for prototyping. Spin up a project, get a URL, done. But running a production bot on it? That's a different story entirely.
The biggest problem was cold starts. Replit spins down your process when nobody is hitting it. For a web server that gets periodic traffic, that's annoying but survivable. For a Discord bot that needs to be online 24/7 to catch slash commands, ticket creations, and payment webhooks? It's a dealbreaker. The bot would go to sleep, someone would run a command, and they'd get nothing. No response, no error, just silence. By the time the process woke up, the interaction token had already expired.
I tried the usual workarounds. Pinging the Replit URL every 5 minutes with an external cron. Using UptimeRobot as a keep-alive. These hacks worked sometimes but they were band-aids on a bullet wound. The process still crashed randomly, usually during peak hours when the 2K community was most active.
Then Replit announced they were deprioritizing always-on hosting. They were moving toward their AI coding features, which is a fine business decision for them. But it meant the hosting side was getting less attention, fewer fixes, and the reliability was only going to get worse. Time to leave.
Choosing Railway
I looked at a few options. Fly.io, Render, a raw VPS on DigitalOcean, and Railway. Each had tradeoffs.
Fly.io is powerful but the DX felt heavy for what I needed. I just wanted to deploy a Node.js bot, not manage a global edge network. Render was solid but their free tier has the same cold start problem. A raw VPS would work but I didn't want to manage systemd services and nginx configs for a Discord bot.
Railway hit the sweet spot. Push to deploy from GitHub. Persistent processes that don't spin down. Volume mounts for SQLite persistence. Environment variables through the dashboard. Logs in real time. And the pricing is usage-based, so I'm only paying for what the bot actually consumes.
The Dockerfile
Replit doesn't use Docker. It has its own Nix-based environment. So the first step was writing a Dockerfile that replicated the bot's runtime environment.
The bot is Node.js with TypeScript, compiled to JavaScript before running. The Dockerfile is straightforward: start from a Node 20 alpine image, copy package files, install dependencies, copy source, build, and run.
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production=false
COPY . .
RUN npm run build
CMD ["node", "dist/index.js"]One gotcha: the bot uses better-sqlite3, which is a native module. On Replit this just worked because their environment had the right build tools. On alpine, I needed to add build dependencies. A quick apk add --no-cache python3 make g++ before the npm install solved that.
I also added a .dockerignore to keep the image small. Node_modules, .git, any local SQLite files, environment files. The production database would live on a Railway volume, not baked into the image.
Volume Mounts for SQLite
This was the part I was most worried about. The bot uses SQLite for ticket data, vouch records, user preferences, and transaction history. On Replit, the database file just lived in the project directory. It persisted across restarts because Replit kept the filesystem alive (most of the time).
Railway works differently. Every deploy creates a fresh container. Without a volume mount, the database would be wiped on every deploy. That's obviously unacceptable for a production bot with real order data.
Railway's volume mounts solved this cleanly. I created a volume in the Railway dashboard, mounted it at /data inside the container, and updated the bot's database path from ./data/bot.db to /data/bot.db. The volume persists across deploys, so the database survives even when the container is rebuilt.
I tested this by deploying, creating some test data, deploying again, and verifying the data was still there. It worked perfectly. The volume mount is the single most important part of this migration for anyone running SQLite on Railway.
DNS Migration via Cloudflare
The bot itself doesn't need a public URL for most operations. Discord's gateway connection is outbound. But the Stripe webhook endpoint and the bot's health check API both need a stable URL.
On Replit, the webhook URL was something like https://mybot.username.repl.co/webhooks/stripe. I needed to migrate this to a proper domain. The bot serves the 2K Service Plug marketplace at serviceplug.net, which was already on Cloudflare.
The migration was simple. I pointed a subdomain (api.serviceplug.net) to Railway's provided domain using a CNAME record in Cloudflare. Then I updated the Stripe webhook endpoint in the Stripe dashboard to point to the new URL. Cloudflare handles SSL termination, and Railway handles the routing.
One thing I learned: update Stripe's webhook URL BEFORE taking down the old endpoint. There's a window where webhook events can be lost if the old URL goes down before the new one is receiving traffic. Stripe retries failed webhooks, but I didn't want to rely on that for payment events.
The Token Conflict Problem
This one bit me and it'll bite you too if you're not careful.
During the migration, I had both the Replit instance and the Railway instance running simultaneously. I wanted to test Railway while keeping Replit as a fallback. Makes sense in theory. In practice, it caused chaos.
Discord only allows one active gateway connection per bot token. When two processes try to connect with the same token, Discord doesn't gracefully hand off. It starts a fight. One process connects, the other gets disconnected. The disconnected one reconnects, kicking out the first. This creates a rapid reconnection loop where both instances are constantly fighting for the gateway, and neither one reliably receives events.
The symptoms were wild. Commands would work on one attempt and fail on the next. Some messages would be processed twice. The bot would appear online, then offline, then online again in rapid succession. Users thought it was having a seizure.
The fix is simple but absolute: never run two instances with the same bot token. When you're ready to migrate, you shut down the old instance first, then start the new one. There's no gradual rollover. It's a hard cutover.
I picked a low-traffic window (early morning EST, when most 2K players are asleep), stopped the Replit instance, waited 30 seconds for Discord to fully disconnect the old session, then started the Railway instance. Total downtime was under a minute.
Environment Variables
The bot had about 15 environment variables. Discord token, Stripe keys, database path, webhook secrets, feature flags, admin user IDs. On Replit, these lived in the Secrets tab. On Railway, they go in the Variables section of the service settings.
I copied them over manually. Yes, manually. There's no automated migration tool, and honestly I wanted to review each one anyway. Some variables were Replit-specific (like the Replit URL for keep-alive pings) and didn't need to come over. Others needed their values updated (like DATABASE_PATH changing from ./data/bot.db to /data/bot.db).
Pro tip: export your Replit secrets before shutting down the instance. I've heard of people deleting their Replit project and then realizing they forgot to copy a secret key. That's a bad day.
The Real Cost Comparison
Let's talk money, because this is where Railway really wins.
Replit's always-on hosting was $25/month for the Hacker plan. And even at that price, the bot still had cold start issues unless I kept the process alive with external pings. So the $25/month was really $25/month plus the cognitive overhead of monitoring a flaky system.
Railway charges based on actual resource usage. My bot uses about 256MB of RAM and minimal CPU (it's mostly idle, waiting for Discord events). That works out to roughly $5-7/month. Some months it's been as low as $4. The volume mount adds a small amount for storage, but we're talking cents.
So I went from $25/month with reliability problems to $5-7/month with zero downtime issues. The bot has been on Railway for weeks now with 100% uptime. No cold starts, no random disconnects, no 3am panic.
Deployment Workflow
On Replit, deployment was just "save the file." The process would restart and pick up changes. Fast, but dangerous. No review step, no build validation, no rollback.
On Railway, I connected the GitHub repo. Every push to the main branch triggers a build and deploy. Railway builds the Docker image, runs the health check, and if everything passes, swaps the new container in. If the build fails, the old container keeps running. If the health check fails, Railway rolls back automatically.
This is a massive upgrade. I can push code with confidence knowing that a bad deploy won't take down the bot. And if I need to rollback, Railway keeps previous deployments available with one click.
What I'd Do Differently
If I were doing this migration again, here's what I'd change:
- Write the Dockerfile first. I spent time getting the bot running on Railway's auto-detected Node.js environment before realizing a Dockerfile gave me more control. Start with Docker from day one.
- Test the volume mount with real data. I tested with dummy data initially. It would have been smarter to copy the production SQLite file to the volume and verify everything looked right before cutting over.
- Set up monitoring before the migration. I added health check endpoints and uptime monitoring after the migration was done. Having them ready beforehand would have given me more confidence during the cutover.
- Document every environment variable. I had a rough list in my head. A proper inventory with descriptions and which ones were Replit-specific would have saved time.
The Bottom Line
Replit is a great tool for prototyping and learning. I don't regret using it to build the initial bot. But for production workloads that need to be online 24/7, it's not the right platform anymore, especially with their strategic shift toward AI coding tools.
Railway gave me cheaper hosting, better reliability, proper deployment workflows, and persistent storage. The migration took about half a day including testing and DNS propagation. If you're running a bot on Replit and dealing with the same headaches, make the switch. It's worth it.
The bot powers the 2K Service Plug marketplace, which handles real money transactions through Discord. If you need a production Discord bot built and deployed properly, get a free estimate.