s3 and conversion server

alt=
DannyA
@dannya
10 years ago
584 posts
Can you provide any update on the expected release of the conversion server discussed here:
https://www.jamroom.net/the-jamroom-network/forum/my_posts/14788/conversion-server-details

Also, you mentioned you had an s3 module partially developed for images that i might be able to use for audio in this thread:
https://www.jamroom.net/the-jamroom-network/forum/my_posts/14728/feature-updates-s3-ha

Can you share that?
updated by @dannya: 02/09/15 08:19:17PM
brian
@brian
10 years ago
10,148 posts
DannyA:
Can you provide any update on the expected release of the conversion server discussed here:
https://www.jamroom.net/the-jamroom-network/forum/my_posts/14788/conversion-server-details

Also, you mentioned you had an s3 module partially developed for images that i might be able to use for audio in this thread:
https://www.jamroom.net/the-jamroom-network/forum/my_posts/14728/feature-updates-s3-ha

Can you share that?

I'm not in the office this week so really cannot get it into a state where it can be shared. I'm also hesitant to "release" something that I can't support, but I'll see what I can do when I get back.

I have been working on a set of "Cloud" modules for Jamroom that I think you'll find useful, but they may not work exactly for how you want to set things up. I think ultimately we are going to promote using your own cluster filesystem over S3 - i.e.

http://www.xtreemfs.org/index.php

As that just has a lot of advantages (speed, lack of vendor lock in, privacy) from a file system perspective over S3 (which is really a file object DB). If you must use S3 you might consider something like:

https://code.google.com/p/s3fuse/

as that will allow your front ends to use a common bucket for file storage.

The file system is by far the hardest part of distributing anything, and finding a way to abstract that away from the developer so it doesn't matter is even harder - every other part of the system I have working so this is really the last piece. We have a Jamroom Cloud profile that is private at this time, but I'll open it up to beta as soon as I can.

Hope this helps!


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
alt=
DannyA
@dannya
10 years ago
584 posts
I'm more confused than ever.
#1- "
I'm not in the office this week so really cannot get it into a state where it can be shared. I'm also hesitant to "release" something that I can't support, but I'll see what I can do when I get back. "
I assume you are referring to conversion server.

#2- My immediate concern for using s3 are unrelated to HA, scalability or file system selection.

I am already using EC2 for core application and my data is being stored in EBS. This forces me to pay for more expensinve EBS and larger instances than i need. It also does not make sense to use another vendor for storage as it will be more expensive; I would have to pay more for transfer and there would probably be a performance hit to staying in the amazon cloud.

I guess maybe i don't understand where s3 fuse fits in rather than using the s2 api's directly.
brian
@brian
10 years ago
10,148 posts
DannyA:
I assume you are referring to conversion server.

I mean the unfinished work I started on an S3 module. The conversion server is already running. It does not however store anything to S3.

Quote:
#2- My immediate concern for using s3 are unrelated to HA, scalability or file system selection.

I am already using EC2 for core application and my data is being stored in EBS. This forces me to pay for more expensinve EBS and larger instances than i need. It also does not make sense to use another vendor for storage as it will be more expensive; I would have to pay more for transfer and there would probably be a performance hit to staying in the amazon cloud.

I guess maybe i don't understand where s3 fuse fits in rather than using the s2 api's directly.

Fuse allows you to "mount" S3 to your EC2 instances as if it were another hard drive on the EC2 instance - i.e. you have 10 EC2 instances you could mount a "media" partition that is really a "media" S3 bucket. This allows PHP to read/write as if it was a regular file system.

The problem with the S3 API is just that - it's an API and not a file system. It works very differently than a regular file system and has many different shortcomings and caveats that you have to think about when using it. I don't want a module developer to have to worry if their module is on a Jamroom site running on S3 or a local file system - it should work the same. Unfortunately S3 and file systems do not work the same, so that's very hard to do.


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net

updated by @brian: 12/02/14 10:12:25AM
brian
@brian
10 years ago
10,148 posts
Just to add a follow up (we may have already talked about this in another post), but have you considered saving yourself a ton of money and moving off of AWS?


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
alt=
DannyA
@dannya
10 years ago
584 posts
Open to suggestions. But haven't found anything with the same breadth of services and scale. Storage, CDN, Transcoding, RDS, all on one service. Compute might be expensive, but storage and delivery is very competitive.

If you release the conversion server, it might be cheaper to do the transcoding somewhere else. And maybe with your file system implementation I could find cheaper storage. But add delivery and cdn and things get more expensive; not to mention perofmance impact making requests to multiple service providers.

Like I said. Open to suggestions, but the numbers need to make sense
updated by @dannya: 12/02/14 10:31:38AM
brian
@brian
10 years ago
10,148 posts
I think a big part comes down to what type of traffic do you anticipate? I know it's fun to think you'll have 10 million visitors from day one, but even the "hottest" SV startups rarely get that type of growth in their first year.

A realistic scenario (based on my experiences with several SV startups and Jamroom for 11+ years) is that you'll get a few dozens to a few hundreds of visitors per day for the first few months to a year. Get some exposure on something like hacker news, Reddit, etc. and you'll have a "spike" to maybe 5,000 for a day (it tails off really fast).

So if I was starting a site, based on what I know of your project, here is what type of system I would setup for launch (all on Linode):

- (1) 2 Gig VPS app server (Ubuntu 14.04, Apache 2.4, PHP 5.5) running Jamroom + Cloud modules
- (1) 2 Gig MySQL Server - runs MySQL only (no Jamroom)
- (1) 2 Gig Conversion Server - runs Jamroom with Cloud modules + Cloud skin

Enable backups on the MySQL server - the other ones can be rebuilt from an image if needed. All connected via private networking.

Your scaling plan:

- Upgrade your app and conversion servers to accommodate more traffic (i.e. 4Gig, 8Gig, etc.).

- Keep an eye on MySQL load on your MySQL server and upgrade to larger VPS as needed.

- Add 1 or more MySQL cache and Session servers (part of the cloud modules) to offload cache and sessions from your main MySQL instance (which is a big hitter as your system gets busier)

- Add a Log Server for centralized logging (part of the cloud modules).

I know this plan does not scale the app server horizontally (which can be done, but then we need to use extremeFS - doable but I'm trying to keep it simple).

You can started for $60 per month and easily handle tens of thousands of visitors a day. You'd have 9 terabytes of transfer included before you paid anything extra for bandwidth.

It's also a MUCH simpler system to administer and manage over AWS, and can be launched right now to prove your MVP.

So anyways, that's what I would do if I were doing it ;)

Hope this helps!

p.s. when we take the Jamroom Cloud profile public we'll have more detailed info on recommended cluster setups, including scaling using extreme FS.


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net

updated by @brian: 12/02/14 11:07:06AM
alt=
DannyA
@dannya
10 years ago
584 posts
A couple of things:
- A 2GB reserved instance on AWS is only about $13.00/month.
- Even if I deployed, 1 app server, 1 rds server, I'd still be paying less per month. (which i currently do). This is enough for an MVP
- Granted, VPS does give you a bunch of free bandwidth initially, however that is just bait that they use to lure you in. After that the pricing does not scale and you will quickly be eaten up in bw and storage costs.
- Beyond the 9 TB transfer, you are stuck at a $100/TB. While AWS can go as low as $10/TB (once you are at massive scale). VPS is 10x more expensive at scale. Bandwidth is by far the biggest cost and the biggest impact on margin.
- Storage is even worse. VPS gives you 40 GB to start. But from there they charge a whopping $.40/GB. AWS starts at $.03 and goes down. That's 13X times more expensive. As soon as you hit those 40GB, you're screwed.


I understand where you are coming from with a simple entry level hosting provider. It keeps things simple and works well for many of your customers.

However, I am building FOR scale. I am not trying to deploy an app but building a PLATFORM; I am providing a service, not a store. My customers are high volume users. Once the API's are deployed, I need to be able to scale. Migrating to a different provider when you NEED to scale is tremendously difficult and expensive. I would rather pay a little bit more now than have to deal with that later.

That being said, I DO agree with your scaling plan. However, as you so clearly laid out, it is dependent on your cloud modules, and eventually, the FS. I am ready to deploy separate servers for conversion, logging, and storage. S3 makes more sense for me because I'm on aws, but I understand why you would want to allow your users other options.

Storage and conversion are my top priority, because those 2 things force me to scale up my application server more quickly. If I can separate those 2 things, a small app server can handle many more users. With s3 I would never have to scale the server for storage. And with the conversion server, if I can set it up to autoscale and spin up addtitional servers in a cluster as demand dictates, I will only pay for what I use as well.
updated by @dannya: 12/02/14 02:14:52PM
alt=
DannyA
@dannya
10 years ago
584 posts
Also, as far as storage, it is far easier to support the API's of multiple storage providers. Even if you just support AWS and Google storage, they can be used by anybody on any server on any host.

I've worked for 4 of the top ten media platforms and CDNs. Right now the competition between AWS, Google, Rackspace, and Akamai is making it very hard for anyone else to compete on hosting; especially storage and BW. And the compute competition is between Google and Amazon is heating up. Nobody can reach their economies of scale.
brian
@brian
10 years ago
10,148 posts
That's cool - if you're going with AWS that's no big deal - I don't share your perspective that they are more expensive though - you get what you pay for.

The $20 a month 2 CPU linode will run circles around the 2 Gig EC2 instance (which has a single virtual CPU equivalent to a circa 2007 1.7 GHz AMD athlon). I've personally built and scaled 2 large systems on AWS and won't do it again - you end up having to run 50 front ends on their small/medium instance size when I could run the same on 5 linodes. And don't get me started on the inconsistencies of EBS (it can't compare with Linodes 1GPs SSD's).

I'm also not a fan of their reliability (look at their recent half day cloud front outage), but I know some people feel that that is a strength of AWS, so that's just my opinion.

If you're doing big bulk transfers however, the bandwidth is definitely better.

Regardless, I know many of our customers are going to choose AWS so we need to make sure Jamroom works awesome on AWS - we just may not be able to support everything AWS offers as a module in JR.

When we take the wraps off the cloud setup you can check it out and let us know what you think - that would be great.

Thanks!


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
brian
@brian
10 years ago
10,148 posts
Just to add as well (as I hadn't looked at it in a long time) was EC2 bandwidth pricing versus Linode - You have to move over 10TB per month before EC2 pricing drops below Linodes 0.10 per GB price - i.e. 20TB per month on AWS is going to run $2,100 versus $1,100 on Linode:

AWS:

1-10 TB @ 0.12 / GB
11 - 20 TB @ 0.09 / GB

Linode:

11-20 @ 0.10 / GB

So looks like bandwidth is even cheaper on Linode (unless I'm overlooking something).

Not a big deal, but more of an FYI since I assumed AWS was cheaper in that regards as well.

Thanks!


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net

updated by @brian: 12/02/14 04:06:11PM
brian
@brian
10 years ago
10,148 posts
Sorry to keep adding more, but Linode is even cheaper - since all bandwidth is "pooled" (shared among all instances) you can get 2TB transfer for $10 by simply adding another 1G Linode, which is 0.02 / GB.


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
alt=
DannyA
@dannya
10 years ago
584 posts
I agree, the .small server size on AWS is probably not a good apples to apples comparison. I was just trying to find something to compare based on your 2GB base requirement. However, like I said, in the long term, I think my storage and bandwidth cost have a much greater impact than computing. Under real loads, I think you can probably optimize things quite a bit. Amazon offers many more configurations as well.

I'll take a closer look at VPS. But even if I do switch over, I'd still definitely use different provider for storage and BW at scale; most likely s3 and cloudfront. VPS would be way to expensive.

I'd be curious on the specs for configuring the conversion server. I'm sure it can be optimized for conversion (more cores/memory). I'd love to start getting it set up.

We are also currently doing some modification to our media modules that use the conversion tools, so I want to make sure my changes will work with conversion server. That's why I was asking for it again. Hopefully we can see a beta next week.
brian
@brian
10 years ago
10,148 posts
DannyA:
I'd be curious on the specs for configuring the conversion server. I'm sure it can be optimized for conversion (more cores/memory). I'd love to start getting it set up.

Yes - you can change the number of conversion workers on a conversion server, and you can also run as many different conversion servers as you want. More info on setting it up will be online once we get it to beta.

Thanks!


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
alt=
DannyA
@dannya
10 years ago
584 posts
I was actually referring to optimizing the server specs and the OS.
updated by @dannya: 12/03/14 02:35:17PM
brian
@brian
10 years ago
10,148 posts
DannyA:
I was actually referring to optimizing the server specs and the OS.

Don't worry about the OS - just use the same image you are using for your other servers - there's not going to be any substantial difference.

The big point is CPUs - you want as many CPUs as you can get - each one can really only handle 1 conversion at a time (you can run more, but they will slow down to the same speed as if you were only running 1).

having fast disk (i.e. SSD) is also really helpful since data is being read and written to the disk very fast - and if you're on a multicore system (say 4-8 CPUs) then you're disk needs to be able to keep up with 8-12 simultaneous conveters reading and writing.

So get LOTS of cpus and LOTS of fast disk if you want to be able to handle a lot at once.

Hope this helps!

p.s. check out mnx.io:

http://mnx.io/pricing/

I know they are VPS but you can get an 8 CPU / 8 GIG RAM / 20 GIG SSD for like $47 a month.

Hope this helps!


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
alt=
DannyA
@dannya
10 years ago
584 posts
Wow
MNX is pretty cheap. This is the beauty of using a real SOA. It really shouldn't matter who i use to host my conversion servers (although there is a lot of data transfer overhead and bandwidth costs involved in using different providers)

I just want to be able to fire up an instance provide credentials, and have the server listen for jobs; not just ffmpeg, but sox, imagemgk, bpm detection tools, key detection looks, or any other media processing that is cpu intensive).

I think there is also something to be said for auto-scaling. Being able to monitor capacity of existing transcoder cluster and spin up additional servers on demand can really help if you have very spikey encoding demands. No sense leaving servers running unnecessarily.

Is it safe to say that one worker queue job is allocated per core? Does that apply to any queue task?
brian
@brian
10 years ago
10,148 posts
DannyA:
I think there is also something to be said for auto-scaling. Being able to monitor capacity of existing transcoder cluster and spin up additional servers on demand can really help if you have very spikey encoding demands. No sense leaving servers running unnecessarily.

The way Jamroom does everything via queues, this is really not as imperative as it may seem - all that happens is your queue latency gets a bit higher when you're in a "spike". Since it's not customer facing (i.e. your users aren't waiting on it to get their page rendered) I wouldn't worry about it.

Quote:
Is it safe to say that one worker queue job is allocated per core? Does that apply to any queue task?

That's controlled by you. What we've done for our transcoding service is set it up so if we have 8 cores (for example), we have 7 video workers and 4 audio workers. The audio workers are much "lighter" on the CPU and can squeeze in to that last core and the "free" time on the video workers CPU.

Hope this helps!


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
alt=
DannyA
@dannya
10 years ago
584 posts
Any update on the beta for conversion server and related services modules?
brian
@brian
10 years ago
10,148 posts
DannyA:
Any update on the beta for conversion server and related services modules?

They will be in beta in the early new year - we have some back end marketplace updates that have to roll out before they do, but it is coming in the next couple weeks.

Hope this helps!


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
alt=
DannyA
@dannya
9 years ago
584 posts
Hey Brian,

Just wanted to follow up on an earlier part of this thread regarding storage. I know you're still working on the cloud documentation, but I didn't see anything in there related to storage.

I'd like to be able to use S3 at this point. I was thinking to just create a module that will write write files from the /data directory to the s3 bucket. The questions are:

I know you have different plans giving Jamroom users more options for storage via XtremeFS. I've also been reading up on both s3fuse and s3fs. Also note that s3fuse is considered an alpha software. Neither has had any updates in a while. They both have a couple of bugs. And apparently you still have the same limitations as if you used the API. Not sure if its a good idea to use in production.

1.Can you recommend an easy way to hook into read/writes of the /data folder to using the s3 api?

2. Will using the s3 API cause problems for me scaling other parts of the cloud software?

3. If s3 fuse is easier/better/faster, can you make suggestions on how to best integrate it into JR so as to save my data there? I'm not clear on where the right hooks are; weather via api or fuse.

In either scenario, we need to consider that we don't want to write everything to s3, only data folder. Other files that have incremental changes are not suited for s3 because you have to read and write the whole file. This is the case even if you use mount since it is just a front end to s3.


I 'm getting ready to go into public beta and would like to settle on storage now so I don't have to move TB's of data later while in production. I think the s3 api might be a bit more work to implement, but it might not be too bad if its just for reading/writing media files.

Any advice appreciated
updated by @dannya: 01/06/15 11:50:32PM
brian
@brian
9 years ago
10,148 posts
Yeah I definitely would not recommend S3fuse any longer - I think it might be the "simplest" in that it integrates right at the file system, but XtremeFS is a much more complete (and maintained) solution.

Creating an S3 "layer" for Jamroom's media system is a pretty complicated task - much more than it may seem on the surface. The reason is that in Jamroom most of the media handling is done via queue entries as "chunks" - i.e. some bit of work is done on the file, then it is put back, then something else might pick up the file and work on it, then put it back, etc. This all works great and isolates the processes really well, but is a problem when the file is not local.

The way Jamroom's media API layer works is not an event/listener like you've used on other modules, it uses Jamroom's plugin functions to do the work. This means you make a specific filesystem (say S3) the "active" media file system (there is no control for this at this time since right now Jamroom only supports the local file system), and then any media function calls - say jrCore_get_media_directory() - will in turn call the active plugin function - i.e. _jrCore_local_media_get_directory(), or jrS3_s3_media_get_directory(), etc.

I do plan on making this work - eventually, as I see cloud storage as an important part of the Jamroom Cloud. There's just not been much demand yet for S3 support, so it's not been a high priority.


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
alt=
DannyA
@dannya
9 years ago
584 posts
This is a huge problem form me that I've been asking about for a long time. Having to use EBS (or storage at a regular hosting provider) basically triples my storage costs and breaks by business model. As I mentioned before, as a cloud application, storage and bandwidth are my biggest costs.

I don't know know how many other customers you have for your cloud services, so I don't know how much demand you are expecting. But I have a feeling I'm one of your first and have been asking about this stuff a long time. Almost all major media cloud applications and media platforms rely on one of the top 3 cloud storage platforms. Local storage does not scale economically for cloud applications

After over a year in development, I feel like I'm pretty screwed right now. I don't think I can launch on ebs and migrate thousands of files and 100's of TBs of content once I'm in production.

I am happy to have my developers do the work, but I need to provide some guidance. From what it sounds like, we have to go in and find every instance where media read/written and use the api to reference an s3 object instead. For just the media files, this CAN'T be that extensive.
Ingest process
Conversion/info extraction
Player requests
Download requests
Image uploads and requests.

I feel this could be done in a couple of weeks if we new what to look for.
brian
@brian
9 years ago
10,148 posts
To be honest with you, I'm not sure how Jamroom is going to perform with 100's of Terabytes of data - that's many, many, many times higher than 99.9% of Jamroom users are going to be using. The reason I say that is I want you to be aware that we do NOT develop or test Jamroom to run at Twitter like scale, as that is way beyond the needs of our customers. There's nothing in Jamroom to prevent it from working that large, but I need you to know that our time and energy is not going to be invested into optimizations for that large (where you really are going to need A LOT of custom solution and a dedicated DevOps staff).

We definitely do plan on having cloud storage support, but it's going to take more time before it is ready mainly because:

- it's not our highest priority - I personally like working on it and try to make time to do it, but we have a lot more work that impacts a much larger amount of customers that will be prioritized first.

- I don't want to develop a layer that is JUST for S3. As soon as we do we're going to have someone asking for CloudFiles support, or GAE, or Heroku, or whatever is popular a year or two from now. So it has to go in as a full "replacement" for the file system, which is more extensive than just dealing with what you're looking for.

So for now I would recommend having your developers become familiar with the functions in jrCore/lib/media.php - that's where the bulk of the media functions are and would need to be updated to work directly with S3.

You also will want to check out the jrAudio/include.php for functions related to audio handling.

Hope this helps!


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
alt=
DannyA
@dannya
9 years ago
584 posts
I realize that. Many platforms that start off as a monolith application. The idea is to start breaking out the modules that need to scale into microservices with their own api's and let them scale independently. The conversion server is a good example of that (in principle; yet to see if it scales the way I want). (Soundcloud did a great engineering post on their migration https://developers.soundcloud.com/blog/building-products-at-soundcloud-part-1-dealing-with-the-monolith

This is also why I'm interested in the Proxima stuff. I've worked for some companies with massively scaled apps, and we always build the API FIRST and built the application on top of that. For everything I wanted to do, I thought I would be able to do it much faster with JR. The framework for the quotas, profiles, and user management was exactly what I was looking for. Delivery could be scaled pretty easily with caching and cdn's. But media processing and storage do not. That's why I've been pushing hard for those to be released. The only other thing is storage. I understand your need to dis intermediate the sevice provider, but that's obviously much more difficult than what I am trying to do.

We've already rewirtten a number of modules or created our own to suit our needs: all the commerce functionality, players, sharing, extensive changes to jr audio, etc.

I have no doubt the developers will be able to do make the change to s3. I'm a little concerned about compatibility with future releases and modules because its a function that almost everything else relies on. If the core media storage function are just in those 2 files, it might not be too bad though. Otherwise, I may just need to just start a new branch and rely only on our own development going forward.
updated by @dannya: 01/08/15 02:19:54PM
brian
@brian
9 years ago
10,148 posts
90% of what you're going to need is in the jrCore/lib/media.php file, so that _should_ make it easier. And there is already an override mechanism in there to bring in your own functions i.e. check out all the _jrCore_local_* functions which are local filesystem functions - you'll want to model your work on those.

I haven't had much time this past week to work on the Cloud modules - the new site rollout and other backlogged stuff has kept me busy, but I'll see if I can get at least a framework in place for S3 and get that online as a beta.

Hope this helps!


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net
alt=
DannyA
@dannya
9 years ago
584 posts
Yes, that helps a bit. At least developers have a place to start. I will point them to this thread if there are additional questions. If there is any way to do this as a module that can be used by others, I'd be happy to post the resultant code.

I appreciate you trying to get a framework in place sooner, but unless you can have something next week, I can't wait on this any longer. I have a feeling you already have a full plate with pending cloud beta and documentation, and the proxima generic api module and documentation, new jamroom site, and supporting the current release.

If anything, I may ask if you can take a look at the final code to make sure we haven't done anything that might cause problems for future releases or other modules.

Thanks again. Keep you posted.
updated by @dannya: 01/08/15 05:29:14PM
brian
@brian
9 years ago
10,148 posts
Sure thing.. thanks!


--
Brian Johnson
Founder and Lead Developer - Jamroom
https://www.jamroom.net

Tags