[00:15] *** robink has joined #archiveteam-bs [00:16] *** BlueMaxim has joined #archiveteam-bs [00:47] *** primus104 has quit IRC (Leaving.) [00:54] *** zenguy_pc has quit IRC (Read error: Operation timed out) [00:58] *** zenguy_pc has joined #archiveteam-bs [01:11] *** zenguy_pc has quit IRC (Read error: Connection reset by peer) [01:13] *** zenguy_pc has joined #archiveteam-bs [01:14] *** dxrt- is now known as dxrt [03:18] *** Yiffiel_d has joined #archiveteam-bs [03:18] *** vitzli has joined #archiveteam-bs [03:21] Hello all. I was here a couple months ago about a comic site that was going down. Just wanted to pop in with a quick q about if anyone knows good online webhosts that ask few questions and require fewer answers. [03:21] octabooru, an mlp archive&streaming site, got shut down without warning by Leaseweb, and they won't say why. [03:21] figure people around here would know about that sort of thing since they deal with sites being shuttered, by owner or by host, with some amount of frequency. [03:21] just figured I'd copypasta [03:22] (primarily, no questions due to that whole "We dindu nuffin!" thing. They really don't have a clue why they were shut down) [03:22] *** Sum has joined #archiveteam-bs [03:36] fyi database weighs 7gb so far and (was) growing, so it's gotta be one of those unlimited space dealies [04:09] i woooondeerrrrr why they got shut down [04:12] is there an audience for a paid community site? [04:12] Something Awful does it, but can't think of too many others [04:13] that way you know the site will have enough to continue, rather than holding your breath [04:20] *** aaaaaaaaa has quit IRC (Quit: Leaving) [04:39] Yiffiel_d: Honestly, if you don't want to answer any questions or you're trying to ba anonymous you're going to end up with sketchy hosting. Also, anyone who promises you unlimited anything is lying. Either there ar limits in the fine print, or they will cut you off at some point anyway. [04:39] archive.org can be used for free unlimited file hosting but you'd still need something for the database stuff [04:40] If you're confident in your technichal skills, I' [04:41] *I'd look into getting a VPS to host it on. A decent provider shouldn't care what you are doing unless they get DMCA or spam complaints. [05:12] *** Fusl has quit IRC (Read error: Operation timed out) [05:19] ah. Anyway, nah not aware of DMCA/spam. After the last weekly stream finished, the whole site had vanished. [05:20] also, could probably do the paid community site when it's finished, but the code is still a work in progress. [05:20] it has a lot of unique frills that aren't quite ready, yet [05:22] *** Fusl has joined #archiveteam-bs [05:32] *** Fusl has quit IRC (Ping timeout: 186 seconds) [05:37] TIL FSF staff have a pretty great afterparty game [05:41] *** tsp_ has joined #archiveteam-bs [06:17] *** Fusl has joined #archiveteam-bs [06:54] *** PurpleSym has joined #archiveteam-bs [07:15] *** primus104 has joined #archiveteam-bs [08:11] *** primus104 has quit IRC (Leaving.) [08:30] *** vitzli has quit IRC (Quit: Leaving) [08:31] *** zerkalo has quit IRC (Read error: Operation timed out) [08:31] *** chazchaz has quit IRC (Read error: Operation timed out) [08:31] *** zenguy_pc has quit IRC (Read error: Operation timed out) [08:32] *** PurpleSym has quit IRC (Read error: Operation timed out) [08:32] *** dcmorton has quit IRC (Read error: Operation timed out) [08:32] *** arkiver has quit IRC (Read error: Operation timed out) [08:32] *** PurpleSym has joined #archiveteam-bs [08:32] *** swebb has quit IRC (Read error: Operation timed out) [08:33] *** ohhdemgir has quit IRC (Read error: Operation timed out) [08:33] *** zenguy_pc has joined #archiveteam-bs [08:34] *** arkiver has joined #archiveteam-bs [08:34] *** joepie91 has quit IRC (Read error: Operation timed out) [08:34] *** dxrt has quit IRC (Ping timeout: 369 seconds) [08:34] *** Laverne has quit IRC (Ping timeout: 369 seconds) [08:35] *** ohhdemgir has joined #archiveteam-bs [08:36] *** joepie91 has joined #archiveteam-bs [08:37] *** zerkalo has joined #archiveteam-bs [08:38] *** dxrt has joined #archiveteam-bs [08:38] *** dcmorton has joined #archiveteam-bs [08:41] *** zerkalo has quit IRC (Ping timeout: 255 seconds) [08:42] *** zerkalo has joined #archiveteam-bs [08:52] *** Sum has quit IRC (Read error: Connection reset by peer) [08:59] *** chazchaz has joined #archiveteam-bs [08:59] *** Laverne has joined #archiveteam-bs [09:00] *** swebb has joined #archiveteam-bs [09:15] *** Fusl has quit IRC (Remote host closed the connection) [09:29] *** Laverne has quit IRC (Ping timeout: 369 seconds) [09:32] *** swebb has quit IRC (Excess Flood) [09:33] *** chazchaz has quit IRC (Ping timeout: 369 seconds) [09:34] You were there? [09:37] *** chazchaz has joined #archiveteam-bs [09:43] *** SimpBrain has quit IRC (Quit: Leaving) [09:44] Apparently Yahoo Groups bans single IPv6 addresses instead of subnets. Interesting. [09:45] *** chazchaz has quit IRC (Ping timeout: 369 seconds) [09:45] *** Fusl has joined #archiveteam-bs [09:48] *** Laverne has joined #archiveteam-bs [09:48] *** swebb has joined #archiveteam-bs [09:52] *** chazchaz has joined #archiveteam-bs [10:39] *** primus104 has joined #archiveteam-bs [10:40] *** schbirid has joined #archiveteam-bs [10:59] *** jspiros has quit IRC (Ping timeout: 186 seconds) [10:59] *** jspiros has joined #archiveteam-bs [12:06] *** zhongfu has quit IRC (Ping timeout: 268 seconds) [12:07] *** zhongfu has joined #archiveteam-bs [12:27] *** primus104 has quit IRC (Leaving.) [12:38] *** BlueMaxim has quit IRC (Quit: Leaving) [14:43] *** Melissa_ has joined #archiveteam-bs [15:19] *** Melissa_ has quit IRC (Quit: Melissa_) [15:21] Here? [15:21] yes [15:21] so in More Options [15:21] key = x-archive-size-hint [15:21] value = number of bytes [15:21] and then I think you're ready to go [15:22] However, I never used the web-uploader with x-archive-size-hint, so I'm not sure if it'll work, but we'll see I guess [15:22] Okay thanks. So do I just upload and then tell you the link so it can be added to the collection, or do I have to do anything specific before uploading? I wouldn't like to upload it and then discover I did something wrong and have to start again. [15:23] I think you can tell us here [15:23] I have not used the site before so it is all new to me [15:23] Okay, thanks [15:23] I'm not sure though if 500GB will upload ok through the webuploader [15:23] Please let me know if it works and remember to set the x-archive-size-hint [15:24] Okay. It might be easier then to split it up then. Is a collection allowed hundruds of smaller files? Sorry for all the dumb questions. [15:28] lytv: please do keep in mind that only WARC-formatted crawls can be added to the Wayback Machine [15:29] you can of course upload crawls in other formats, but they won't make it into the Wayback [15:29] (unless it's ARC, but I doubt anybody uses that still) [15:29] Okay. Are non-warc crawls useful at all or should I not bother? I don't want to waste archive.org HDD space for useless material. [15:30] lytv: also, you'll generally want to try not to put more than ~300 files into a single item (and even that is stretching it) [15:30] lytv: definitely upload them :) [15:30] just won't make it into the wayback, is all [15:30] Okay, thanks for all your help [15:30] also, I've uploaded some big files through the webuploader, but I have a very fast connection, that is very stable [15:30] so YMMV [15:30] My upload is 300 KB/s [15:30] :( [15:31] lytv: can you describe the format of the data a bit more? [15:31] like, is it one big file? [15:31] arkiver: I believe the web uploader automatically sends along the size hint, but I'm not 100% sure about that [15:32] Crawls of various websites done with wget, then compressed with 7z. Maybe ~1,000 websites in total and most of the crawls are re-crawls of the same websites because they changed a lot and deleted threads all the time. The files are not currently in one big file they are seperate for each crawl [15:32] lytv: so, 1000 separate 7z files then, basically [15:32] Yes [15:32] lytv: average size of each? [15:32] or does it vary very strongly? [15:33] Some are just a few megabytes, others are a few gigabytes [15:33] right :P [15:33] lytv: what kind of metadata do you have about each? [15:34] None unfortunatly. Just the time of when they were done. The html files are all timestamped as well [15:35] I started in 2008 and I didn't know what I was doing really. wget only recently got warc support [15:35] I couldn't figure out how to use hendrix on windows [15:36] lytv: alright. are there any scripting languages you're familiar with? [15:37] Only windows batch files, I don't know if that is a scripting language. I know a tiny bit of php but only enough for making basic web pages [15:37] lytv: as long as you can generate a .csv file from it :P [15:38] lytv: basically, if you can generate a .csv file, you can just use ias3upload [15:38] which will read out a CSV file of metadata, item names, filenames, etc [15:38] and do an upload for each [15:38] I _think_ that also works on Windows [15:38] it may require some hackery though [15:39] Ah I see. Thanks for that information. I'll give the web uploader a go first and use that as a fallback if it doesn't work. [15:39] Anyway, I appreciate all the help. Thanks a lot [15:39] lytv: alright - either way, try to create one item per file [15:39] per 7z that is [15:39] lytv: and make sure to set the crawl date as the item date [15:39] then you should be good [15:40] Thanks [15:59] *** toad1 has joined #archiveteam-bs [16:44] *** primus104 has joined #archiveteam-bs [16:50] *** zerkalo has quit IRC (leaving) [17:04] *** aaaaaaaaa has joined #archiveteam-bs [17:31] *** SimpBrain has joined #archiveteam-bs [17:41] *** jspiros has quit IRC (Ping timeout: 186 seconds) [17:43] *** jspiros has joined #archiveteam-bs [17:46] *** arkiver5 has joined #archiveteam-bs [17:59] *** primus104 has quit IRC (Leaving.) [18:04] *** jspiros has quit IRC (Ping timeout: 186 seconds) [18:05] *** arkiver5 has quit IRC (Ping timeout: 252 seconds) [18:20] ersi: yes, I was there [18:20] (still here) [18:43] *** primus104 has joined #archiveteam-bs [18:54] *** SimpBrain has quit IRC (Read error: Operation timed out) [19:08] cool [19:26] *** aaaaaaaa_ has joined #archiveteam-bs [19:26] *** aaaaaaaaa has quit IRC (Read error: Connection reset by peer) [19:35] *** aaaaaaaa_ is now known as aaaaaaaaa [19:54] *** achip has quit IRC (Read error: Operation timed out) [19:54] *** Gfy has quit IRC (Read error: Operation timed out) [19:54] *** kvieta has quit IRC (Read error: Operation timed out) [19:55] *** sep332 has quit IRC (Write error: Broken pipe) [19:55] *** phuzion has quit IRC (Read error: Operation timed out) [19:55] *** yakfish has quit IRC (Read error: Operation timed out) [19:55] *** garyrh has quit IRC (Read error: Operation timed out) [19:55] *** aaaaaaaa_ has joined #archiveteam-bs [19:56] *** botpie91 has quit IRC (Read error: Operation timed out) [19:57] *** dxrt has quit IRC (Read error: Operation timed out) [19:58] *** dxrt has joined #archiveteam-bs [19:58] *** beardicus has quit IRC (Read error: Operation timed out) [20:00] *** toad1 has quit IRC (Read error: Operation timed out) [20:01] *** aaaaaaaaa has quit IRC (Read error: Operation timed out) [20:01] *** aaaaaaaa_ is now known as aaaaaaaaa [20:07] *** garyrh has joined #archiveteam-bs [20:13] *** phuzion has joined #archiveteam-bs [20:13] *** sep332 has joined #archiveteam-bs [20:13] *** kvieta has joined #archiveteam-bs [20:13] *** Gfy has joined #archiveteam-bs [20:14] *** yakfish has joined #archiveteam-bs [20:14] *** achip has joined #archiveteam-bs [20:15] *** botpie91 has joined #archiveteam-bs [20:18] *** toad1 has joined #archiveteam-bs [20:21] *** tsp_ has quit IRC (Read error: Operation timed out) [20:21] *** tsp_ has joined #archiveteam-bs [20:23] *** Baljem_ has joined #archiveteam-bs [20:23] *** ppsym has joined #archiveteam-bs [20:25] *** zenguy_pc has quit IRC (hub.efnet.us irc.Prison.NET) [20:25] *** PurpleSym has quit IRC (hub.efnet.us irc.Prison.NET) [20:25] *** nico_32 has quit IRC (hub.efnet.us irc.Prison.NET) [20:25] *** Baljem has quit IRC (hub.efnet.us irc.Prison.NET) [20:25] *** boozehoun has joined #archiveteam-bs [20:28] *** nico_32_ has joined #archiveteam-bs [20:46] *** beardicus has joined #archiveteam-bs [20:48] *** ppsym has quit IRC (Remote host closed the connection) [21:20] *** schbirid has quit IRC (Remote host closed the connection) [22:59] *** BlueMaxim has joined #archiveteam-bs [23:46] *** dashcloud has quit IRC (Ping timeout: 483 seconds) [23:54] *** dashcloud has joined #archiveteam-bs