#archiveteam-bs 2015-10-04,Sun

↑back Search

Time Nickname Message
00:15 🔗 robink has joined #archiveteam-bs
00:16 🔗 BlueMaxim has joined #archiveteam-bs
00:47 🔗 primus104 has quit IRC (Leaving.)
00:54 🔗 zenguy_pc has quit IRC (Read error: Operation timed out)
00:58 🔗 zenguy_pc has joined #archiveteam-bs
01:11 🔗 zenguy_pc has quit IRC (Read error: Connection reset by peer)
01:13 🔗 zenguy_pc has joined #archiveteam-bs
01:14 🔗 dxrt- is now known as dxrt
03:18 🔗 Yiffiel_d has joined #archiveteam-bs
03:18 🔗 vitzli has joined #archiveteam-bs
03:21 🔗 Yiffiel_d Hello all. I was here a couple months ago about a comic site that was going down. Just wanted to pop in with a quick q about if anyone knows good online webhosts that ask few questions and require fewer answers.
03:21 🔗 Yiffiel_d octabooru, an mlp archive&streaming site, got shut down without warning by Leaseweb, and they won't say why.
03:21 🔗 Yiffiel_d figure people around here would know about that sort of thing since they deal with sites being shuttered, by owner or by host, with some amount of frequency.
03:21 🔗 Yiffiel_d just figured I'd copypasta
03:22 🔗 Yiffiel_d (primarily, no questions due to that whole "We dindu nuffin!" thing. They really don't have a clue why they were shut down)
03:22 🔗 Sum has joined #archiveteam-bs
03:36 🔗 Yiffiel_d fyi database weighs 7gb so far and (was) growing, so it's gotta be one of those unlimited space dealies
04:09 🔗 Rotab i woooondeerrrrr why they got shut down
04:12 🔗 Sum is there an audience for a paid community site?
04:12 🔗 Sum Something Awful does it, but can't think of too many others
04:13 🔗 Sum that way you know the site will have enough to continue, rather than holding your breath
04:20 🔗 aaaaaaaaa has quit IRC (Quit: Leaving)
04:39 🔗 chazchaz Yiffiel_d: Honestly, if you don't want to answer any questions or you're trying to ba anonymous you're going to end up with sketchy hosting. Also, anyone who promises you unlimited anything is lying. Either there ar limits in the fine print, or they will cut you off at some point anyway.
04:39 🔗 DFJustin archive.org can be used for free unlimited file hosting but you'd still need something for the database stuff
04:40 🔗 chazchaz If you're confident in your technichal skills, I'
04:41 🔗 chazchaz *I'd look into getting a VPS to host it on. A decent provider shouldn't care what you are doing unless they get DMCA or spam complaints.
05:12 🔗 Fusl has quit IRC (Read error: Operation timed out)
05:19 🔗 Yiffiel_d ah. Anyway, nah not aware of DMCA/spam. After the last weekly stream finished, the whole site had vanished.
05:20 🔗 Yiffiel_d also, could probably do the paid community site when it's finished, but the code is still a work in progress.
05:20 🔗 Yiffiel_d it has a lot of unique frills that aren't quite ready, yet
05:22 🔗 Fusl has joined #archiveteam-bs
05:32 🔗 Fusl has quit IRC (Ping timeout: 186 seconds)
05:37 🔗 yipdw TIL FSF staff have a pretty great afterparty game
05:41 🔗 tsp_ has joined #archiveteam-bs
06:17 🔗 Fusl has joined #archiveteam-bs
06:54 🔗 PurpleSym has joined #archiveteam-bs
07:15 🔗 primus104 has joined #archiveteam-bs
08:11 🔗 primus104 has quit IRC (Leaving.)
08:30 🔗 vitzli has quit IRC (Quit: Leaving)
08:31 🔗 zerkalo has quit IRC (Read error: Operation timed out)
08:31 🔗 chazchaz has quit IRC (Read error: Operation timed out)
08:31 🔗 zenguy_pc has quit IRC (Read error: Operation timed out)
08:32 🔗 PurpleSym has quit IRC (Read error: Operation timed out)
08:32 🔗 dcmorton has quit IRC (Read error: Operation timed out)
08:32 🔗 arkiver has quit IRC (Read error: Operation timed out)
08:32 🔗 PurpleSym has joined #archiveteam-bs
08:32 🔗 swebb has quit IRC (Read error: Operation timed out)
08:33 🔗 ohhdemgir has quit IRC (Read error: Operation timed out)
08:33 🔗 zenguy_pc has joined #archiveteam-bs
08:34 🔗 arkiver has joined #archiveteam-bs
08:34 🔗 joepie91 has quit IRC (Read error: Operation timed out)
08:34 🔗 dxrt has quit IRC (Ping timeout: 369 seconds)
08:34 🔗 Laverne has quit IRC (Ping timeout: 369 seconds)
08:35 🔗 ohhdemgir has joined #archiveteam-bs
08:36 🔗 joepie91 has joined #archiveteam-bs
08:37 🔗 zerkalo has joined #archiveteam-bs
08:38 🔗 dxrt has joined #archiveteam-bs
08:38 🔗 dcmorton has joined #archiveteam-bs
08:41 🔗 zerkalo has quit IRC (Ping timeout: 255 seconds)
08:42 🔗 zerkalo has joined #archiveteam-bs
08:52 🔗 Sum has quit IRC (Read error: Connection reset by peer)
08:59 🔗 chazchaz has joined #archiveteam-bs
08:59 🔗 Laverne has joined #archiveteam-bs
09:00 🔗 swebb has joined #archiveteam-bs
09:15 🔗 Fusl has quit IRC (Remote host closed the connection)
09:29 🔗 Laverne has quit IRC (Ping timeout: 369 seconds)
09:32 🔗 swebb has quit IRC (Excess Flood)
09:33 🔗 chazchaz has quit IRC (Ping timeout: 369 seconds)
09:34 🔗 ersi You were there?
09:37 🔗 chazchaz has joined #archiveteam-bs
09:43 🔗 SimpBrain has quit IRC (Quit: Leaving)
09:44 🔗 PurpleSym Apparently Yahoo Groups bans single IPv6 addresses instead of subnets. Interesting.
09:45 🔗 chazchaz has quit IRC (Ping timeout: 369 seconds)
09:45 🔗 Fusl has joined #archiveteam-bs
09:48 🔗 Laverne has joined #archiveteam-bs
09:48 🔗 swebb has joined #archiveteam-bs
09:52 🔗 chazchaz has joined #archiveteam-bs
10:39 🔗 primus104 has joined #archiveteam-bs
10:40 🔗 schbirid has joined #archiveteam-bs
10:59 🔗 jspiros has quit IRC (Ping timeout: 186 seconds)
10:59 🔗 jspiros has joined #archiveteam-bs
12:06 🔗 zhongfu has quit IRC (Ping timeout: 268 seconds)
12:07 🔗 zhongfu has joined #archiveteam-bs
12:27 🔗 primus104 has quit IRC (Leaving.)
12:38 🔗 BlueMaxim has quit IRC (Quit: Leaving)
14:43 🔗 Melissa_ has joined #archiveteam-bs
15:19 🔗 Melissa_ has quit IRC (Quit: Melissa_)
15:21 🔗 lytv Here?
15:21 🔗 arkiver yes
15:21 🔗 arkiver so in More Options
15:21 🔗 arkiver key = x-archive-size-hint
15:21 🔗 arkiver value = number of bytes
15:21 🔗 arkiver and then I think you're ready to go
15:22 🔗 arkiver However, I never used the web-uploader with x-archive-size-hint, so I'm not sure if it'll work, but we'll see I guess
15:22 🔗 lytv Okay thanks. So do I just upload and then tell you the link so it can be added to the collection, or do I have to do anything specific before uploading? I wouldn't like to upload it and then discover I did something wrong and have to start again.
15:23 🔗 arkiver I think you can tell us here
15:23 🔗 lytv I have not used the site before so it is all new to me
15:23 🔗 lytv Okay, thanks
15:23 🔗 arkiver I'm not sure though if 500GB will upload ok through the webuploader
15:23 🔗 arkiver Please let me know if it works and remember to set the x-archive-size-hint
15:24 🔗 lytv Okay. It might be easier then to split it up then. Is a collection allowed hundruds of smaller files? Sorry for all the dumb questions.
15:28 🔗 joepie91 lytv: please do keep in mind that only WARC-formatted crawls can be added to the Wayback Machine
15:29 🔗 joepie91 you can of course upload crawls in other formats, but they won't make it into the Wayback
15:29 🔗 joepie91 (unless it's ARC, but I doubt anybody uses that still)
15:29 🔗 lytv Okay. Are non-warc crawls useful at all or should I not bother? I don't want to waste archive.org HDD space for useless material.
15:30 🔗 joepie91 lytv: also, you'll generally want to try not to put more than ~300 files into a single item (and even that is stretching it)
15:30 🔗 joepie91 lytv: definitely upload them :)
15:30 🔗 joepie91 just won't make it into the wayback, is all
15:30 🔗 lytv Okay, thanks for all your help
15:30 🔗 joepie91 also, I've uploaded some big files through the webuploader, but I have a very fast connection, that is very stable
15:30 🔗 joepie91 so YMMV
15:30 🔗 lytv My upload is 300 KB/s
15:30 🔗 lytv :(
15:31 🔗 joepie91 lytv: can you describe the format of the data a bit more?
15:31 🔗 joepie91 like, is it one big file?
15:31 🔗 joepie91 arkiver: I believe the web uploader automatically sends along the size hint, but I'm not 100% sure about that
15:32 🔗 lytv Crawls of various websites done with wget, then compressed with 7z. Maybe ~1,000 websites in total and most of the crawls are re-crawls of the same websites because they changed a lot and deleted threads all the time. The files are not currently in one big file they are seperate for each crawl
15:32 🔗 joepie91 lytv: so, 1000 separate 7z files then, basically
15:32 🔗 lytv Yes
15:32 🔗 joepie91 lytv: average size of each?
15:32 🔗 joepie91 or does it vary very strongly?
15:33 🔗 lytv Some are just a few megabytes, others are a few gigabytes
15:33 🔗 joepie91 right :P
15:33 🔗 joepie91 lytv: what kind of metadata do you have about each?
15:34 🔗 lytv None unfortunatly. Just the time of when they were done. The html files are all timestamped as well
15:35 🔗 lytv I started in 2008 and I didn't know what I was doing really. wget only recently got warc support
15:35 🔗 lytv I couldn't figure out how to use hendrix on windows
15:36 🔗 joepie91 lytv: alright. are there any scripting languages you're familiar with?
15:37 🔗 lytv Only windows batch files, I don't know if that is a scripting language. I know a tiny bit of php but only enough for making basic web pages
15:37 🔗 joepie91 lytv: as long as you can generate a .csv file from it :P
15:38 🔗 joepie91 lytv: basically, if you can generate a .csv file, you can just use ias3upload
15:38 🔗 joepie91 which will read out a CSV file of metadata, item names, filenames, etc
15:38 🔗 joepie91 and do an upload for each
15:38 🔗 joepie91 I _think_ that also works on Windows
15:38 🔗 joepie91 it may require some hackery though
15:39 🔗 lytv Ah I see. Thanks for that information. I'll give the web uploader a go first and use that as a fallback if it doesn't work.
15:39 🔗 lytv Anyway, I appreciate all the help. Thanks a lot
15:39 🔗 joepie91 lytv: alright - either way, try to create one item per file
15:39 🔗 joepie91 per 7z that is
15:39 🔗 joepie91 lytv: and make sure to set the crawl date as the item date
15:39 🔗 joepie91 then you should be good
15:40 🔗 lytv Thanks
15:59 🔗 toad1 has joined #archiveteam-bs
16:44 🔗 primus104 has joined #archiveteam-bs
16:50 🔗 zerkalo has quit IRC (leaving)
17:04 🔗 aaaaaaaaa has joined #archiveteam-bs
17:31 🔗 SimpBrain has joined #archiveteam-bs
17:41 🔗 jspiros has quit IRC (Ping timeout: 186 seconds)
17:43 🔗 jspiros has joined #archiveteam-bs
17:46 🔗 arkiver5 has joined #archiveteam-bs
17:59 🔗 primus104 has quit IRC (Leaving.)
18:04 🔗 jspiros has quit IRC (Ping timeout: 186 seconds)
18:05 🔗 arkiver5 has quit IRC (Ping timeout: 252 seconds)
18:20 🔗 yipdw ersi: yes, I was there
18:20 🔗 yipdw (still here)
18:43 🔗 primus104 has joined #archiveteam-bs
18:54 🔗 SimpBrain has quit IRC (Read error: Operation timed out)
19:08 🔗 ersi cool
19:26 🔗 aaaaaaaa_ has joined #archiveteam-bs
19:26 🔗 aaaaaaaaa has quit IRC (Read error: Connection reset by peer)
19:35 🔗 aaaaaaaa_ is now known as aaaaaaaaa
19:54 🔗 achip has quit IRC (Read error: Operation timed out)
19:54 🔗 Gfy has quit IRC (Read error: Operation timed out)
19:54 🔗 kvieta has quit IRC (Read error: Operation timed out)
19:55 🔗 sep332 has quit IRC (Write error: Broken pipe)
19:55 🔗 phuzion has quit IRC (Read error: Operation timed out)
19:55 🔗 yakfish has quit IRC (Read error: Operation timed out)
19:55 🔗 garyrh has quit IRC (Read error: Operation timed out)
19:55 🔗 aaaaaaaa_ has joined #archiveteam-bs
19:56 🔗 botpie91 has quit IRC (Read error: Operation timed out)
19:57 🔗 dxrt has quit IRC (Read error: Operation timed out)
19:58 🔗 dxrt has joined #archiveteam-bs
19:58 🔗 beardicus has quit IRC (Read error: Operation timed out)
20:00 🔗 toad1 has quit IRC (Read error: Operation timed out)
20:01 🔗 aaaaaaaaa has quit IRC (Read error: Operation timed out)
20:01 🔗 aaaaaaaa_ is now known as aaaaaaaaa
20:07 🔗 garyrh has joined #archiveteam-bs
20:13 🔗 phuzion has joined #archiveteam-bs
20:13 🔗 sep332 has joined #archiveteam-bs
20:13 🔗 kvieta has joined #archiveteam-bs
20:13 🔗 Gfy has joined #archiveteam-bs
20:14 🔗 yakfish has joined #archiveteam-bs
20:14 🔗 achip has joined #archiveteam-bs
20:15 🔗 botpie91 has joined #archiveteam-bs
20:18 🔗 toad1 has joined #archiveteam-bs
20:21 🔗 tsp_ has quit IRC (Read error: Operation timed out)
20:21 🔗 tsp_ has joined #archiveteam-bs
20:23 🔗 Baljem_ has joined #archiveteam-bs
20:23 🔗 ppsym has joined #archiveteam-bs
20:25 🔗 zenguy_pc has quit IRC (hub.efnet.us irc.Prison.NET)
20:25 🔗 PurpleSym has quit IRC (hub.efnet.us irc.Prison.NET)
20:25 🔗 nico_32 has quit IRC (hub.efnet.us irc.Prison.NET)
20:25 🔗 Baljem has quit IRC (hub.efnet.us irc.Prison.NET)
20:25 🔗 boozehoun has joined #archiveteam-bs
20:28 🔗 nico_32_ has joined #archiveteam-bs
20:46 🔗 beardicus has joined #archiveteam-bs
20:48 🔗 ppsym has quit IRC (Remote host closed the connection)
21:20 🔗 schbirid has quit IRC (Remote host closed the connection)
22:59 🔗 BlueMaxim has joined #archiveteam-bs
23:46 🔗 dashcloud has quit IRC (Ping timeout: 483 seconds)
23:54 🔗 dashcloud has joined #archiveteam-bs

irclogger-viewer