[00:18] *** Start is now known as StartAway [00:23] *** Froggypwn has quit IRC (Quit: ~ Trillian Astra - www.trillian.im ~) [00:24] *** dashcloud has quit IRC (Read error: Operation timed out) [00:25] *** cf_ has joined #archiveteam [00:28] *** dashcloud has joined #archiveteam [00:29] *** cf has quit IRC (Read error: Operation timed out) [00:29] *** cf_ is now known as cf [00:31] *** mistym has quit IRC (Read error: Operation timed out) [00:38] *** cf_ has joined #archiveteam [00:45] *** cf has quit IRC (Read error: Operation timed out) [00:45] *** cf_ is now known as cf [01:01] *** Aranje has quit IRC (Read error: Connection reset by peer) [01:02] *** Aranje has joined #archiveteam [01:02] *** Aranje has quit IRC (Read error: Connection reset by peer) [01:03] *** Aranje has joined #archiveteam [01:44] *** Ymgve has quit IRC () [02:01] *** Sanqui has quit IRC (Read error: Operation timed out) [02:03] *** Sanqui has joined #archiveteam [02:03] *** primus104 has quit IRC (Leaving.) [02:06] *** Sanqui has quit IRC (Read error: Operation timed out) [02:07] *** Sanqui has joined #archiveteam [02:16] *** dashcloud has quit IRC (Ping timeout: 272 seconds) [02:21] *** dashcloud has joined #archiveteam [02:38] *** xtr-201 has quit IRC (Ping timeout: 852 seconds) [02:39] *** StartAway is now known as Start [03:04] *** xtr-201 has joined #archiveteam [03:06] *** Silent700 has joined #archiveteam [03:07] Hello - question re: archive.org's uploader... [03:07] If I upload a .zip of .tif files, will they be processed into a PDF or just the online-viewable format? [03:07] Or nothing at all? [03:11] it will turn it into every other format you normally see [03:12] so .zip is OK? [03:12] yes [03:12] a zip of TIFF files is quite good [03:12] I found an old forum post that said it would not burst the zip, but I figured that was outdated [03:12] that is outdated. [03:12] then that is what you will get [03:12] * xmc points to /topic :) [03:12] IA will get it [03:12] then that is what /they/ will get [03:13] and you, by way of them :) [03:13] fair enough [03:42] does the uploaded support multiple files at once _and_ create an entry for each one? Or must they all be contained under one entry? [03:42] for example, issues of a magazine [03:47] each issue should go in a different item [03:47] And the uploader tool will do that, or the uploader (me) must do that? [03:50] *** BlueMaxim has quit IRC (Quit: Leaving) [04:07] *** Start has quit IRC (Read error: Operation timed out) [04:22] *** Start has joined #archiveteam [04:37] *** dx has quit IRC (Remote host closed the connection) [04:37] *** dx has joined #archiveteam [04:47] *** aaaaaaaaa has quit IRC (Leaving) [05:37] *** Silent700 has left [05:42] http://wayback.archive.org/web/20140331060749/http://ex.fm/api [05:43] ex.fm's api still works with archive.ex.fm [05:45] of course it could also be scraped through http://archive.ex.fm/siteindex.xml.gz [05:47] *** dashcloud has quit IRC (Read error: Connection reset by peer) [05:48] *** dashcloud has joined #archiveteam [05:54] Is there a way to tell wget to delete the non-warc'd parts of a download and just leave the generated warc file? [05:55] mutoso: wget-warc will write non-warc content no matter what, but you can limit it with --output-document and --truncate-output [05:56] if this is unacceptable, consider wpull, which implements most of the same wget options [05:59] *** mistym has joined #archiveteam [06:07] i've made more detailed notes on sites shutting down: http://paste.archivingyoursh.it/jotejecagi.vhdl [06:08] *** dashcloud has quit IRC (Read error: Operation timed out) [06:13] *** dashcloud has joined #archiveteam [06:24] good news on the KBS News Today archiving [06:24] i found the podcast paths: [06:24] http://newsdown.kbs.gscdn.com/news_today/2013/01/04/10.mp4 [06:24] working one: http://newsdown.kbs.gscdn.com/news_today/2013/08/01/10.mp4 [06:25] the august 2013 videos are not even on youtube [06:37] Start: I didn't know about ziplist... :/ [06:37] So the deadline is december 10th. I'll make sure we have it running today or tommorrow [06:37] thanks [06:38] i wasn't aware of it until today either [06:38] thankfully it seems pretty straightforward to grab [06:39] SketchCow: how are you on space on FOS? I'll start Viddy this afternoon with FOS, but will try to have someone else volunteer to do take the files [06:40] But first everything is going into FOS [07:04] For which... viddy or whatever? [07:05] FOS is overburdened, and will be for at least a week. [07:05] I mean, that doesn't seem to be stopping anyone, but there we are. [07:05] viddy [07:06] I mean, nothing's going to change soon - FOS is under ridiculous strain. [07:06] Right now FOS hs 2.3tb free but that seems to change in no time. Like I said, someone pumped 1tb of Archivebot joy into it in 72 hours. [07:07] We have currently only 15 days left for viddy, so we really have to start. Since FOS is overburned, I'll try have an other upload target other then FOS as soon as possible, but till that time I want to go with FOS. [07:07] If you do viddy, don't do Halo. [07:07] That's all. [07:08] ask Kenshin if you can reuse part of tank [07:08] [archiveteam@tank ~]$ df -h /home/archiveteam/ [07:08] Filesystem Size Used Avail Capacity Mounted on [07:08] zfs/archiveteam 20T 842G 19T 4% /home/archiveteam [07:08] I will say, though, that that machine gets pretty busy during twitpic [07:08] also part of life is learning to deal with losing a few times [07:09] Well, Halo is also the lowest of low priority high-resource projects, frankly. [07:09] Having a millions-of-games sample of Halo games, which we now already have, is quite good. [07:09] Having a comprehensive collection of every game played on Halo 2-4 is less good [07:09] or important. [07:11] arkiver: regarding ziplist, the highest valid recipe ID i could find is http://www.ziplist.com/recipes/3320393 [07:19] i found about 1 hour of video that you hate to archive: http://video-hot.nowcom.gscdn.com/mvod/20140424/321/85542321_1.mp4 [07:20] there folders are public so i was able to fine it [07:20] *find it [07:21] i sent one of the these gscdn.com urls to archive bot cause it has 2008 starcraft tournaments videos [07:21] from South Korea [07:23] SketchCow: I bet if you made a graph of archivebot uploads over time it would be *very* up-and-down [07:25] SketchCow: i found a web stream of WCS Season 1 GSL: http://ongameimg.gscdn.com/web/WCS%20Season1%20GSL/05-14/05.14%20WCS%20FULL%20Version.mp4 [07:25] over 3 hours long [07:27] i may have to look at ongameimg.gscdn.com more [07:27] SketchCow: pretty much a roller coaster [07:27] that would be interesting data. [07:27] looks like some folders are open but folders going to that folder are not [07:27] SketchCow: I'll do that [07:28] yipdw: I'll do that also [07:29] *** Start is now known as StartAway [07:38] *** primus104 has joined #archiveteam [07:50] *** StartAway is now known as Start [07:59] *** Start is now known as StartAway [08:00] It finished. Right now, I have 2.6tb of Halo backed up [08:00] Buffered. Waiting to be uploaded. [08:07] *** mistym has quit IRC (Remote host closed the connection) [08:47] arkiver: getting another 8TB box ready for you [10:15] Silent is gone, but I think a good guide is https://en.wikisource.org/wiki/Help:DjVu_files#The_Internet_Archive (yes, I'm biased) [10:20] *** BlueMaxim has joined #archiveteam [10:20] *** schbirid has joined #archiveteam [10:26] *** signius_ has quit IRC (Read error: Operation timed out) [10:39] *** signius_ has joined #archiveteam [10:43] *** BlueMaxim has quit IRC (Quit: Leaving) [11:10] *** APerti has quit IRC (Ping timeout: 378 seconds) [12:29] *** LordNigh2 has joined #archiveteam [12:31] *** Lord_Nigh has quit IRC (Ping timeout: 272 seconds) [12:31] *** LordNigh2 is now known as Lord_Nigh [12:35] *** slipstrea is now known as raylee [12:56] *** cf has quit IRC (Quit: cf) [12:57] *** Froggypwn has joined #archiveteam [13:01] *** Ymgve has joined #archiveteam [13:42] *** cf has joined #archiveteam [13:46] *** xk_id has joined #archiveteam [14:03] Sorry I'm pretty new to all of this - what does FOS actually stand for? [14:04] will__: fortress of solitude, it's just a name [14:05] Ah right thanks Nemo_bis [14:16] "fields of saves" [14:16] as in build it, and they will come [14:16] :DDDDD [14:16] He built it, we came [14:16] we saved [14:16] we broke things... [14:17] *** StartAway is now known as Start [14:37] *** REiN^ has joined #archiveteam [14:41] *** Start has quit IRC (Remote host closed the connection) [14:43] *** cf has quit IRC (Quit: cf) [15:09] *** BiggieJon has joined #archiveteam [15:19] *** aaaaaaaaa has joined #archiveteam [15:40] *** mistym has joined #archiveteam [15:40] *** cf has joined #archiveteam [15:43] Start: how do you get that each item of ziplist is ~9 MB? http://paste.archivingyoursh.it/jotejecagi.vhdl [15:43] *** mistym has quit IRC (Remote host closed the connection) [15:57] *** dashcloud has quit IRC (Read error: Operation timed out) [16:01] *** mistym has joined #archiveteam [16:04] *** dashcloud has joined #archiveteam [16:32] *** Froggypwn has quit IRC (Quit: ~ Trillian Astra - www.trillian.im ~) [16:36] *** thechip has joined #archiveteam [16:38] *** cf has quit IRC (Quit: cf) [16:54] *** joe_ has joined #archiveteam [16:55] hey, not sure where to direct this request, but would it possible to have https://archive.org point to https://web.archive.org in the search box (rather than https:// pointing to the unencrypted http://web.archive.org) ? [17:01] direct it to info@archive.org [17:02] *** okeuday has quit IRC (Read error: Operation timed out) [17:03] don't use email :( [17:08] *** aaaaaaaaa has quit IRC (Read error: Operation timed out) [17:15] *** mistym has quit IRC (Remote host closed the connection) [17:16] *** aaaaaaaaa has joined #archiveteam [17:19] *** okeuday has joined #archiveteam [17:19] *** Start has joined #archiveteam [17:25] Oh yeah, I was wondering about that. Sort of bothered me that the login page is apparently unencrypted. [17:29] First you'd probably need a valid certificate though. archiveteam.org's is self signed. And it's mismatched. And it's expired. [17:29] Maybe I should go ahead and send that email. [17:29] <3 [17:29] thank you, then I'll be on my way [17:29] *** joe_ has left [17:31] *** primus104 has quit IRC (Leaving.) [17:43] *** SmileyG has joined #archiveteam [17:44] E-mail sent. [17:46] *** Smiley has quit IRC (Read error: Operation timed out) [17:46] *** ivan` has quit IRC (Ping timeout: 248 seconds) [17:48] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [17:48] *** phuzion has quit IRC (Read error: Connection reset by peer) [17:48] *** Start has quit IRC (Read error: Connection reset by peer) [17:48] *** Sanqui has quit IRC (Ping timeout: 248 seconds) [17:48] *** Start has joined #archiveteam [17:48] *** Sanqui has joined #archiveteam [17:48] *** phuzion has joined #archiveteam [17:50] *** ivan` has joined #archiveteam [17:50] *** Start has quit IRC (Read error: Connection reset by peer) [17:51] *** todrobbin has joined #archiveteam [17:58] *** Start has joined #archiveteam [17:58] i've created ziplist a wiki page [18:02] Did someone say wiki [18:02] *** ruukasu has joined #archiveteam [18:07] *** APerti has joined #archiveteam [18:23] *** kyan_ has joined #archiveteam [18:31] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [18:32] *** ruukasu has joined #archiveteam [18:35] godane: There's no extant Internet Archive ERIC Archive, right? [18:36] from what i know [18:36] OK. [18:37] http://www.calpro-online.org/ERIC/index.asp [18:38] *** rejon has quit IRC (Ping timeout: 480 seconds) [18:41] New archive made, I'm going to pump the 55,539 items into it. [18:42] ok [18:47] *** Start has quit IRC (Ping timeout: 252 seconds) [18:52] *** kyan_ is now known as kyan [18:53] so now i think i can get 2009 videos of kbs news today [18:53] what the hell [18:54] i just add 700k/10.mp4 to the end of the date path [18:54] its my best guess if all the old asf files are now mp4 [19:00] *** cf has joined #archiveteam [19:14] Ha ha, they're swapping over the ERICs, but it's not enjoying the process. [19:18] *** cf has quit IRC (Quit: cf) [19:19] I'll hit the rest of your stuff when it's done with this, godane. [19:20] ok [19:21] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [19:21] *** ruukasu has joined #archiveteam [19:22] SketchCow: looks like i well be able to get a good chuck of KBS News Today [19:22] i have just started download 20080729 episode of it [19:23] How did the metadata for the arcade go? [19:25] Which what. [19:25] Like, we still need a lot. [19:25] Some arcade games are becoming unplayable, but not unsurprisingly the more obscure ones that REALLY need metadata are still without good descriptions. [19:28] *** primus104 has joined #archiveteam [19:57] *** xmc has quit IRC (Quit: Lost terminal) [19:58] *** chronomex has joined #archiveteam [20:04] *** cf has joined #archiveteam [20:09] I'm now agressively moving all my "project" items to one drive and all the "incoming buffer" to another, although I doubt this will change much. [20:13] *** dashcloud has quit IRC (Read error: Connection reset by peer) [20:14] *** dashcloud has joined #archiveteam [20:15] *** mutoso has quit IRC (Read error: Operation timed out) [20:28] for a second then i was worried you meant OneDrive [20:33] *** Start has joined #archiveteam [20:37] *** mistym has joined #archiveteam [20:50] SketchCow: good luck with getting all those items up into IA. [20:51] For now I'll keep Halo paused and Viddy is going to midas [20:51] So FOS can totally focus on getting everything up and be ready for next projects [20:55] ------------------------------------------ [20:55] Viddy has started [20:55] 15 days left to download all the videos [20:55] Join our newest project in: #viddiot [20:55] ------------------------------------------ [21:00] *** nblr_ is now known as nblr [21:00] *** chronomex is now known as xmc [21:01] *** cf has quit IRC (cf) [21:04] *** mistym has quit IRC (Remote host closed the connection) [21:07] *** dashcloud has quit IRC (Read error: Connection reset by peer) [21:07] arkiver: should i add ziplist to the upcoming projects list? [21:09] *** dashcloud has joined #archiveteam [21:10] *** human39 has joined #archiveteam [21:12] Start: yeah, sure [21:12] ok [21:13] Start: I'll start working on it now, will keep you informed [21:14] alright [21:14] Maybe it time to create a channel for ziplist [21:14] it is* [21:14] #zipyourlips [21:15] sounds good. you decide on the channel name ;) [21:23] *** Start has quit IRC (Quit: Disconnected.) [21:25] *** mistym has joined #archiveteam [21:33] *** cf has joined #archiveteam [21:35] *** todrobbin has quit IRC (Quit: todrobbin) [22:02] *** cf has quit IRC (Quit: cf) [22:13] *** schbirid has quit IRC (Leaving) [23:17] *** Start has joined #archiveteam