[01:11] *** GLaDOS has quit IRC (Quit: Oh crap, I died.) [01:11] *** GLaDOS has joined #archiveteam [01:20] *** davidar has quit IRC (Quit: Connection closed for inactivity) [01:29] *** xXx_ndidd has joined #archiveteam [01:38] *** Stiletto has quit IRC (Read error: Operation timed out) [01:40] *** ndiddy has quit IRC (Read error: Operation timed out) [01:53] *** Stiletto has joined #archiveteam [01:58] *** davidar has joined #archiveteam [02:00] *** Yoshimura has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) [02:00] *** Yoshimura has joined #archiveteam [03:00] *** JesseW has joined #archiveteam [03:11] *** JesseW has quit IRC (Ping timeout: 370 seconds) [03:29] *** SadDM has joined #archiveteam [03:29] *** swebb sets mode: +o SadDM [03:30] *** matthusb- has joined #archiveteam [03:30] *** yakfish has joined #archiveteam [03:32] *** jspiros has joined #archiveteam [03:40] *** JesseW has joined #archiveteam [03:49] *** bwn has quit IRC (Quit: Leaving) [03:50] *** bwn has joined #archiveteam [04:28] *** scyther has joined #archiveteam [04:59] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:04] have a question about archiving all the VOD videos on a particular twitch channel [05:05] a popular streamer named dalsarius82 died suddenly and unexpectedly (pulmonary embolism?) tuesday morning and i'm wondering how all his videos can be archived before they expire due to the twitch 30 day thing [05:06] *** Sk1d has joined #archiveteam [05:09] *** Honno has joined #archiveteam [05:10] *** ariscop has quit IRC (Read error: Operation timed out) [05:22] *** scyther has quit IRC (Quit: Leaving) [05:25] Lord_Nigh: maybe point youtube-dl and see if that gets anything? [05:25] point youtube-dl at his channel* [06:03] *** ariscop has joined #archiveteam [06:15] *** MMovie2 has joined #archiveteam [06:17] *** MMovie has quit IRC (Read error: Operation timed out) [06:25] *** WinterFox has joined #archiveteam [06:39] *** kisspunch has quit IRC (ZNC - http://znc.in) [06:43] *** kisspunch has joined #archiveteam [06:48] *** JesseW has quit IRC (Ping timeout: 370 seconds) [06:53] *** Medowar has joined #archiveteam [06:54] *** JesseW has joined #archiveteam [07:00] *** metalcamp has joined #archiveteam [07:11] *** schbirid has joined #archiveteam [07:30] *** JesseW has quit IRC (Ping timeout: 370 seconds) [07:31] *** mek_ has joined #archiveteam [07:31] *** vitzli has joined #archiveteam [07:42] does youtube-dl handle twitch? the documentation was ambiguous? [07:45] yepyep, iirc you should just need to feed it the channel url / url to the list of vods, https://github.com/rg3/youtube-dl/blob/master/docs/supportedsites.md [07:55] *** casdr has left [08:07] *** RedType has quit IRC (Read error: Operation timed out) [08:23] *** atomotic has joined #archiveteam [08:29] *** vitzli has quit IRC (Quit: Leaving) [08:34] *** hook54321 has joined #archiveteam [08:37] *** mek_ has quit IRC (Read error: Operation timed out) [08:38] *** Stiletto has quit IRC (Read error: Operation timed out) [08:57] *** RedType has joined #archiveteam [09:19] *** Stiletto has joined #archiveteam [09:28] *** bwn has quit IRC (Read error: Operation timed out) [09:51] *** bwn has joined #archiveteam [09:53] *** Stiletto has quit IRC (Read error: Operation timed out) [10:18] *** VADemon has joined #archiveteam [10:20] *** arkiver2 has joined #archiveteam [10:34] *** arkiver2 has quit IRC (Ping timeout: 244 seconds) [11:15] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [11:27] *** quibbit has joined #archiveteam [11:27] moddb? [11:28] *** quibbit has quit IRC (Client Quit) [11:28] http://www.moddb.com/groups/moddb/news/gamefront-is-closing-lets-save-the-mods#readarticle [11:30] We might want to get GameFront going and done [11:48] *** atomotic has joined #archiveteam [11:50] We already have most of it [11:50] I'll make sure we also get the latest files [11:58] *** AndroUser has joined #archiveteam [11:58] *** AndroUser has quit IRC (Client Quit) [11:58] *** AndroUser has joined #archiveteam [12:00] *** AndroUser has left [12:01] *** arkiver2 has joined #archiveteam [12:01] *** swebb sets mode: +o arkiver2 [12:27] *** VADemon has quit IRC (Quit: left4dead) [12:42] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [12:44] *** z00nx has quit IRC (Ping timeout: 244 seconds) [12:55] *** Stiletto has joined #archiveteam [13:01] *** arkiver2 has quit IRC (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com )) [13:02] *** arkiver2 has joined #archiveteam [13:02] *** swebb sets mode: +o arkiver2 [13:03] *** scyther has joined #archiveteam [13:06] *** z00nx has joined #archiveteam [13:09] out of curiosity, has anyone archived http://videolectures.net/ ? [13:10] *** BlueMaxim has quit IRC (Quit: Leaving) [13:13] hrm... "Video downloads (where available) are limited to several lecture downloads per day. We were forced to introduce download limitations due to several web-bot experiences in which automated downloaders tried to transfer terabytes of data and consequently over-saturated our servers and internet connections, thus hindering our quality of service to other [13:13] users." http://videolectures.net/faq/ [13:25] *** vitzli has joined #archiveteam [13:42] *** WinterFox has quit IRC (Remote host closed the connection) [14:09] *** arkiver3 has joined #archiveteam [14:09] *** swebb sets mode: +o arkiver3 [14:09] *** dashcloud has quit IRC (Ping timeout: 250 seconds) [14:10] *** dashcloud has joined #archiveteam [14:13] *** arkiver2 has quit IRC (Ping timeout: 244 seconds) [14:30] *** MMovie has joined #archiveteam [14:37] *** MMovie2 has quit IRC (Ping timeout: 633 seconds) [15:09] *** atomotic has joined #archiveteam [15:10] *** arkiver3 has quit IRC (Ping timeout: 244 seconds) [15:15] *** arkiver3 has joined #archiveteam [15:15] *** swebb sets mode: +o arkiver3 [15:15] *** arkiver3 has quit IRC (Client Quit) [15:18] *** Froggypwn has joined #archiveteam [15:31] *** atrocity has quit IRC (Read error: Connection reset by peer) [15:37] *** Froggypwn has quit IRC (Ping timeout: 961 seconds) [15:37] *** JesseW has joined #archiveteam [15:42] *** Froggypwn has joined #archiveteam [16:06] *** bwn_ has joined #archiveteam [16:11] *** bwn__ has joined #archiveteam [16:15] *** mek_ has joined #archiveteam [16:16] 8.6T GameTrailers packed and uploaded to IA. Now I need someone to verify that that looks good before I purge the uploaded folder. [16:16] ^-- arkiver [16:16] yeah [16:17] SketchCow: how do you normally verify an upload? [16:19] do hashes on both sides? [16:19] *** bwn has quit IRC (Read error: Operation timed out) [16:20] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:22] I'm not sure if SketchCow hashes everything or checks using some other way [16:23] *** bwn_ has quit IRC (Read error: Operation timed out) [16:24] MrRadar: try to pressure the moddb guy into letting us archive their files please :) [16:24] So ask him if we can archive ModDB? [16:24] yes [16:25] Have they blocked us before? [16:25] no idea but raw access > all the haxxoring [16:25] Yeah [16:27] *** aashsdhdf has joined #archiveteam [16:27] *** VADemon has joined #archiveteam [16:30] Should I just ask them to get in touch with us or should I propose something more concrete? [16:31] I'd say archive using warrior project [16:31] but they're stable now, so I don't think it's needed yet [16:32] Yeah, in the comment thread he specifically mentioned that ModDB is run as a private indepdent site to avoid having any pressure to turn signficant profits or die [16:32] *independent [16:38] *** Medowar has quit IRC (Quit: Connection closed for inactivity) [16:39] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [16:41] *** khaoohs has joined #archiveteam [16:42] "could we rsync your files?" [16:43] do you have metadata of those files? [16:43] what filenames do they have? [16:43] Grabbing them using a warrior project will also get us all metadata [16:43] Yeah, ModDB is much more than just a file hosting site [16:43] They also host blogs, photos, videos, etc [16:43] From the mode developers [16:43] *mod [16:44] If we archived them we'd definitely want to grab all that stuff too [16:48] Does anything need to happen to save Gamefront? http://www.archiveteam.org/index.php?title=GameFront [16:48] yeah but it might be easy for them to give out static files [16:48] aashsdhdf: We've been working on it in the background for a while for months. We've archived something around 28TB of data from their site. [16:50] Awesome. Sometimes projects like this have a divide and conquer strategy where people just need to download a client or start wget-ing stuff [16:50] Yeah, we have a VM (the ArchiveTeam Warrior) people can run and it just archives stuff [16:51] (You can also run the projects yourself if you have a Linux system) [16:55] *** aashsdhdf has quit IRC (Leaving) [17:03] *** mek_ has quit IRC (Ping timeout: 244 seconds) [17:06] *** espes__ has quit IRC (Read error: Operation timed out) [17:08] *** sivoais has quit IRC (Ping timeout: 244 seconds) [17:09] *** DarkMorph has joined #archiveteam [17:12] *** scyther has quit IRC (Quit: Leaving) [17:15] *** espes__ has joined #archiveteam [17:19] *** sivoais has joined #archiveteam [17:23] *** DarkMorph has quit IRC (Sayonara!) [18:25] *** pfallenop has joined #archiveteam [18:33] *** Stiletto has quit IRC (Read error: Operation timed out) [18:41] *** kris33 has joined #archiveteam [18:42] *** bzc6p has joined #archiveteam [18:42] *** swebb sets mode: +o bzc6p [18:42] *** bzc6p sets mode: +oooo achip Atluxity chfoo chfoo- [18:42] *** bzc6p sets mode: +oooo closure dashcloud Fletcher Fletcher_ [18:42] *** bzc6p sets mode: +oooo GLaDOS godane HCross HCross2 [18:42] *** bzc6p sets mode: +oooo ivan` joepie91 JW_work Kazzy [18:43] *** bzc6p sets mode: +oooo Kaz midas PurpleSym Sanqui [18:43] *** bzc6p sets mode: +oooo SimpBrain schbirid Smiley Start [18:43] *** bzc6p sets mode: +ooo VADemon wp494 yipdw_ [18:43] *** bzc6p sets mode: -o bzc6p [18:43] *** bzc6p has left [19:01] *** kris33 has quit IRC (Textual IRC Client: www.textualapp.com) [19:15] *** SilSte has joined #archiveteam [19:17] *** PMT has joined #archiveteam [19:18] *** VADemon has quit IRC (Ping timeout: 260 seconds) [19:20] Hi all, I'm trying to run the Warrior appliance on VBox 5.0.14, and it mostly works, but if I try to select the GameFront project, it just says "beginning work on a project" and nothing ever shows up under Current Project. The last console output line is DEBUG Load pipeline /data/data/gamefront-aa..., no errors are printed. I tried nuking the data/data/projects/gamefront-... dir and/or the ~warrior/projec [19:21] ts/gamefront dir (while the daemon said it was idle) and letting it recreate them, and nothing changed. I can select other projects just fine and have them run without error (e.g. the Yuku project), just not the GameFront project. [19:21] (Sorry, it was VBox 5.0.16, not that I expect that to be apropos here; I just forgot I'd updated it.) [19:23] PMT: I think there may just not be any more work to do on that right at the moment. We've got most of it already, and the person in charge of adding new tasks may not have added any of the last bits yet. [19:27] WFM, I just wanted to make sure it wasn't broken. :) [19:27] thanks for checking! [19:39] *** philpem has joined #archiveteam [19:40] *** arkiver2 has joined #archiveteam [19:40] *** swebb sets mode: +o arkiver2 [19:57] *** kline has joined #archiveteam [20:01] *** bwn__ has quit IRC (Read error: Operation timed out) [20:03] what relationship does Archive Team have with archive.org , and what actually happens to archived data (where is it stored, how can it be retrieved by people?) I get that you're not archive.org, but http://archiveteam.org/index.php?title=Dev/Infrastructure makes it look like you have them as a host for (some?) data [20:05] we upload a ton of shit to archive.org but anyone can upload a ton of shit to archive.org [20:05] Neat. I saw the Warrior VM mentioned and I was hoping to get one (or more :)) running at my university (we're looking to spin up a student managed server rack), so I'm just trying to see how it all works. [20:05] The ArchiveTeam is indepdenent of the IA though our founder and de-facto leader SketchCow (a.k.a. Jason Scott) is an employee of the IA [20:06] it's a conspiracy, really [20:06] While the IA does general web crawling we do focussed grabs of content that's in immediate danger of disappearing [20:06] yeah, I saw that link, I occasionally bump into textfiles.com when he does somethingthat gets posted onto HN [20:07] Since he works there he uses his privlidges to incorporate our data into the IA's WayBack Machine which makes it about a million times more accessible than if we were just posting the raw .warc files [20:08] So it's basically a win-win (the IA gets more content and we have an outlet to store out work) [20:08] ace [20:08] (most) of our data. [20:08] True that [20:08] so, for stuff that isn't uploaded to IA, does it just end up on volunteered storage? And how do normal people get access to that? [20:08] convoluted means [20:09] I know torrents have been done but in general yeah [20:09] basically all of it is uploaded at least to IA [20:10] some of it is also on torrents or other servers [20:10] *** mismatch has quit IRC (Ping timeout: 370 seconds) [20:10] ok, that sounds good. Hopefully when we have infrastructure I'll be able to spin something up. [20:11] thanks for explaining what I'm sure gets explained often :) [20:12] *** vitzli has quit IRC (Quit: Leaving) [20:12] kline: there are a number of other options to help; you could run a IPFS node, or contribute to the IA.BAK project (an effort to use git-annex to backup archive.org) [20:12] or run a #archivebot pipeline [20:13] ia.bak is already on the "charitable assistance" list, but storage is probably going to be a problem for us (we use only donated hardware, HDD, etc) than bandwidth [20:13] nods [20:15] likewise, the only "issue" I can see with running Warrior is that we still need to negotiate a block of IPs, right now the proposal from IS is to get a single IP and nat like hell, and if that's the case I don't particularly want to risk the rack getting banned for crawls [20:15] an archivebot pipeline would be great [20:16] you are hopefully already thinking of acting as a bittorrent seed for various linux distros, I hope [20:16] it's all up for negotiation with information services. They already blackholed one of the trial servers we had for running a student BNC because they thought it was a botnet cnc server [20:17] heh [20:17] also make sure they don't have a content filtering proxy for porn or whatever [20:17] they do, but we'll probably be excluded from it [20:18] but yes, at the start, our infra resources are going to be way higher than our demand, so for at least the beginning the intention is to make the most of it for charitable stuff (ia.bak, IRC server hosting, maybe colo'd backups/failure for other similar projects) and scale it back when times are tight [20:19] *** mismatch has joined #archiveteam [20:22] *** bwn__ has joined #archiveteam [20:24] tah dah [20:36] yes, yes? [20:42] 8we have Gamefront by jow? [20:52] We got most of the files. The remaining files will also be done [20:53] And the forums need to be saved [20:54] I haven't had a good look at the forums yet, but I think we can do that in the project too [20:56] *** vegbrasil has quit IRC (*) [20:58] Archivebot has also been scraping the forums sine January [21:02] *** bwn_ has joined #archiveteam [21:03] I would prioritize. [21:08] *** kline has quit IRC (Quit: http://chat.efnet.org ) [21:09] *** vegbrasil has joined #archiveteam [21:15] *** bwn__ has quit IRC (Read error: Operation timed out) [21:21] MrRadar: yeah. hopefully that pipeline doesn't die.. [21:22] SketchCow: we don't have to prioritize, we can get everything [21:23] We still have 15 days left [21:28] arkiver: the Yuku grab is hitting a lot of infinite redirects (I think) such as http://cms-pixel.crowdreport.com/urlqueue/?pubID=1003&URL=http://eqguide.yuku.com/topic/780/http//[...]http//www.bck.org&divID=wrapper [21:29] is it a list of 20x URLs which the exact same URLs? [21:30] I don't see a list of URLs, I see like /topic/780/http//http//http//[some other URL] [21:30] can you post a log for me? [21:30] yeah I'll try [21:32] http://kitsune.fastquake.com/files/archiveteam/yuku_10threads_eqguide_78-wget.log [21:34] 8.6T GameTrailers packed and uploaded to IA. Now I need someone to verify that that looks good before I purge the uploaded folder. [21:34] ^-- SketchCow [21:34] No one was sure what the procedure for that was when I asked earlier. [21:35] see if the upload redrows. [21:35] ah, I'll check [21:35] check the history. [21:36] if it all goes through, good. [21:36] nor redrows here [21:36] no* [21:36] arkiver: How can I check that? [21:36] It was uploaded using my keys, and I don't see any redrows here [21:36] ok. [21:36] Ah, right, so maybe I can't check then. :) [21:37] if something blows, it won't be the structure. [21:38] zino, here we go: https://archive.org/catalog.php?all=1&search_submitter=Arkiver@hotmail.com [21:39] Log in it says... [21:39] * zino makes an account [21:39] log link above, btw arkiver [21:39] Frogging: yes, I saw [21:39] strange error [21:39] oki :) [21:39] will be fixed anyway [21:51] I have 16G worth of .rsync-tmp dirs left in the gametralers incoming dir. I assume that is typical and can just be deleted? [21:54] arkiver: Any preference for what to upload next? fotolog (8.1T) or ftpgrab (8.7T). [21:55] ftpgrab [21:56] OK. [21:58] *** metalcamp has quit IRC (Ping timeout: 250 seconds) [22:00] *** arkiver2 has quit IRC (Ping timeout: 244 seconds) [22:11] *** mek_ has joined #archiveteam [22:16] arkiver: Sent you some config changes to confirm before I start this. [22:16] will check that [22:16] Frogging: will fix the problem tomorrow, off to bed now [22:16] all right [22:17] good night :) [22:17] you too! [22:17] arkiver: Feel free to do my stuff tomorrow too. No rush. Have a good night. :) [22:17] zino: will do that then, you too! [22:31] *** Honno has quit IRC (Read error: Operation timed out) [22:34] *** RedType has quit IRC (Read error: Operation timed out) [22:35] *** nickname_ has joined #archiveteam [22:38] I used wget to download everything on wshu.org (on 2016-03-31) and output it to a *.warc.gz and *.cdx, now what? [22:40] https://archive.org/upload/ [22:42] okay [22:44] *** VADemon has joined #archiveteam [22:45] *** WinterFox has joined #archiveteam [22:47] *** schbirid has quit IRC (Quit: Leaving) [23:13] *** dashcloud has quit IRC (Read error: Operation timed out) [23:18] *** nickname_ has quit IRC (Read error: Operation timed out) [23:21] *** dashcloud has joined #archiveteam [23:34] *** nickname_ has joined #archiveteam [23:44] *** RedType has joined #archiveteam