[00:22] *** creature_ has joined #archiveteam [00:22] *** creature has quit IRC (Read error: Operation timed out) [00:31] *** russkick has joined #archiveteam [00:32] *** maxogden has joined #archiveteam [00:34] hey all, im starting to work through https://docs.google.com/spreadsheets/d/12-__RqTqQxuxHNOln3H5ciVztsDMJcZ2SVs1BrfqYCc/edit#gid=0 and try to produce manifests of files in the linked datasets, eventual goal being to download them but wanted to start by producing metadata [00:35] maxogden: Welcome. We've got a dedicated channel for archiving federal data at #cheetoflee. Join us there [01:15] *** brayden_ has joined #archiveteam [01:15] *** swebb sets mode: +o brayden_ [01:20] *** brayden has quit IRC (Ping timeout: 633 seconds) [01:35] *** PepsiMax has quit IRC (Remote host closed the connection) [01:36] *** coretx has quit IRC (Remote host closed the connection) [01:37] *** Somebody has joined #archiveteam [01:39] *** coretx has joined #archiveteam [01:42] *** K4k has quit IRC (Quit: WeeChat 1.6) [01:42] *** K4k has joined #archiveteam [01:44] *** K4k_ has joined #archiveteam [01:44] *** K4k_ has quit IRC (Client Quit) [01:50] *** DiscantX has joined #archiveteam [01:59] *** pizzaiolo has quit IRC (Remote host closed the connection) [02:04] *** K4k has quit IRC (Quit: WeeChat 1.6) [02:08] *** i336 has quit IRC (Remote host closed the connection) [02:13] *** DiscantX has quit IRC (Read error: Operation timed out) [02:14] *** DFJustin has quit IRC (Remote host closed the connection) [02:15] *** Start has quit IRC (Remote host closed the connection) [02:16] *** K4k has joined #archiveteam [02:18] *** K4k has quit IRC (Client Quit) [02:19] *** Start has joined #archiveteam [02:19] *** DFJustin has joined #archiveteam [02:19] *** Start has quit IRC (Client Quit) [02:20] *** DFJustin has quit IRC (Remote host closed the connection) [02:22] *** K4k has joined #archiveteam [02:25] *** DFJustin has joined #archiveteam [02:32] *** Start has joined #archiveteam [02:59] *** ravetcofx has quit IRC (Read error: Operation timed out) [03:09] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [03:15] *** ravetcofx has joined #archiveteam [03:16] *** Sk1d has joined #archiveteam [03:26] *** vOYtEC has quit IRC (Ping timeout: 245 seconds) [03:49] *** nwf has quit IRC (Read error: Operation timed out) [03:55] *** russkick has quit IRC () [04:16] *** lukeman has quit IRC (Ping timeout: 260 seconds) [04:16] *** lukeman_ has joined #archiveteam [04:40] *** ravetcofx has quit IRC (Read error: Operation timed out) [04:49] *** ravetcofx has joined #archiveteam [04:56] *** Verifirs has joined #archiveteam [04:56] A couple concerns and questions [04:57] What are your thoughts on mirroring libgen? [04:57] How large is libgen, at least in estimation? [05:01] Verifirs: unofficial rumor is that archive.org already has (but is not distributing) a copy of libgen. [05:02] ArchiveTeam, as an entity, doesn't generally host stuff itself (although various people associated with it may, although not generally stuff with such legal attention on it). [05:02] (and I don't speak for ArchiveTeam, of course) [05:03] I'm guessing it's not that large [05:03] Maybe 25 terabytes [05:05] IDK. [05:06] scihub is 160tb currently [05:07] Hmm [05:07] https://sites.google.com/site/themetalibrary/library-genesis <--- that source says 100 tb [05:16] *** Frogging has quit IRC (El Psy Kongroo!) [05:24] *** Frogging has joined #archiveteam [05:32] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:38] *** Sk1d has joined #archiveteam [05:47] This is quite a discussion. [05:49] *** WinterFox has joined #archiveteam [05:50] someone should grab this folder: ftp://nlmpubs.nlm.nih.gov/nlmdata/pir/ [05:50] i think i can grab everything else [05:50] my problem with that folder is its over 9gb [05:53] anyways if someone else wants to do ftp://nimpubs.nlm.nih.gov go for it [05:58] *** maelstrom has quit IRC (Quit: Leaving) [06:06] *** ScariLD has quit IRC (Ping timeout: 268 seconds) [06:11] *** BlueMaxim has quit IRC (Read error: Operation timed out) [06:26] *** Verifirs has quit IRC (Ping timeout: 268 seconds) [06:35] godane: it's just 2 files, right? dc.zip and dr.zip [06:35] I'm getting dc.zip now [06:35] I'll upload them both to FOS [06:38] I think this is what they are: https://pir.nlm.nih.gov/pilot/instructions.html [06:40] thanks [06:40] happy to do it [06:42] *** i336__ has quit IRC (Read error: Operation timed out) [06:42] interesting -- it looks like we made a copy of this back ftp server back in 2014: https://ia802506.us.archive.org/8/items/nlmpubs.nlm.nih.gov [06:43] thats good [06:48] it's going very very slowly :-( [06:50] *** ravetcofx has quit IRC (Read error: Operation timed out) [06:58] i put [A-Z]* into http://oceandata.sci.gsfc.nasa.gov/search/file_search.cgi and it 500'd after 5 minutes :( [06:59] now it thinks im running a query already and wont let me do another [07:02] web applications are hard [07:06] *** ravetcofx has joined #archiveteam [07:08] now on https://ofmpub.epa.gov/storpubl/dw_pages.querycriteria, "Number of Results Returned: 219,578,132. The number of Results that match your search criteria has exceeded the allowable Result Report size limit of 13,000,000. " :D [07:15] *** brayden_ is now known as brayden [07:21] *** dashcloud has quit IRC (Ping timeout: 250 seconds) [07:22] *** dashcloud has joined #archiveteam [07:24] *** BlueMaxim has joined #archiveteam [07:45] *** Honno has joined #archiveteam [08:20] *** ndiddy has quit IRC (Read error: Connection reset by peer) [09:05] *** kristian_ has joined #archiveteam [09:21] *** DiscantX has joined #archiveteam [09:34] *** Smiley has joined #archiveteam [09:35] *** RichardG_ has joined #archiveteam [09:39] *** kanzure_ has joined #archiveteam [09:39] *** Igloo_ has joined #archiveteam [09:39] *** K4k has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** RichardG has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** Yoshimura has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** godane has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** SmileyG has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** achip has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** superkuh has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** Igloo has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** kanzure has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** tpw_rules has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** aschmitz has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** patrickod has quit IRC (hub.efnet.us irc.Prison.NET) [09:41] *** patricko- has joined #archiveteam [09:51] *** K4k_ has joined #archiveteam [09:52] *** Honno has quit IRC (Read error: Operation timed out) [09:56] *** godane has joined #archiveteam [10:50] *** DiscantX has quit IRC (Ping timeout: 492 seconds) [11:14] *** superkuh has joined #archiveteam [11:15] *** Igloo_ is now known as Igloo [11:39] *** WinterFox has quit IRC (Remote host closed the connection) [12:08] *** BlueMaxim has quit IRC (Read error: Operation timed out) [12:47] *** Honno has joined #archiveteam [13:58] *** RichardG_ is now known as RichardG [14:25] *** wp494 has quit IRC (Ping timeout: 506 seconds) [14:27] *** wp494 has joined #archiveteam [14:45] *** i336_ has joined #archiveteam [15:22] *** REiN^ has quit IRC (Max SendQ exceeded) [15:22] *** REiN^ has joined #archiveteam [15:31] *** Start has quit IRC (Quit: Disconnected.) [15:32] *** REiN^ has quit IRC (Max SendQ exceeded) [15:32] *** REiN^ has joined #archiveteam [15:32] *** i336_ has quit IRC (Read error: Operation timed out) [16:18] *** atomotic has joined #archiveteam [16:27] *** K4k_ has quit IRC (WeeChat 1.6) [16:28] *** K4k has joined #archiveteam [16:28] *** db48x has joined #archiveteam [16:41] *** godane has quit IRC (Quit: Leaving.) [16:41] *** godane has joined #archiveteam [17:05] *** Somebody has quit IRC (Ping timeout: 370 seconds) [17:07] *** kristian_ has quit IRC (Quit: Leaving) [17:08] *** Deewiant has quit IRC (Ping timeout: 194 seconds) [17:09] *** Fletcher has quit IRC (Ping timeout: 244 seconds) [17:16] *** Fletcher has joined #archiveteam [17:25] *** Deewiant has joined #archiveteam [17:51] *** Stiletto has quit IRC () [17:59] *** nwf has joined #archiveteam [18:00] *** Start has joined #archiveteam [18:12] *** Start has quit IRC (Quit: Disconnected.) [18:21] *** nwf has quit IRC (Read error: Operation timed out) [18:24] *** nwf has joined #archiveteam [18:50] *** bRick5772 has joined #archiveteam [19:22] *** TJ has joined #archiveteam [19:30] On archivesteam.org I haven't seen any mention/projects involving https://gamesrepublic.com/ website yet, which just announced in my email 2 days ago they are shutting down. So I'm here to just pop a quick heads-up about it incase nobody else has brought it up yet. It was primarily a website where you could sorta buy games from, I put it in the same category of websites that gog.com and humblebundle.com try to be. Here is a copy of the [19:30] TJ: cut off at "here is a copy of the" [19:31] Here is a copy of the email shutdown message I received: http://pastebin.com/piH0GSkN, bye! [19:32] *** jrwr has quit IRC (Remote host closed the connection) [19:39] *** TJ has left [19:44] *** Start has joined #archiveteam [20:24] *** bwn has quit IRC (Ping timeout: 244 seconds) [20:25] *** Stiletto has joined #archiveteam [20:35] *** bwn has joined #archiveteam [20:37] *** Stiletto has quit IRC (Ping timeout: 362 seconds) [20:38] (I've gone ahead and set off a raft of Gamesrepublic.com archivebot jobs - all should be handled. Also grabbed the website of the main site/studio that owned it.) [20:38] *** Stiletto has joined #archiveteam [20:51] *** maelstrom has joined #archiveteam [20:53] *** tpw_rules has joined #archiveteam [20:55] *** jfranusic has joined #archiveteam [20:56] *** jfranusic has quit IRC (Client Quit) [20:58] *** Start has quit IRC (Quit: Disconnected.) [21:04] *** Start has joined #archiveteam [21:12] *** Stilett0 has joined #archiveteam [21:14] *** Stiletto has quit IRC (Ping timeout: 246 seconds) [21:20] *** antomati_ is now known as antomatic [21:24] *** Stilett0 has quit IRC (Read error: Connection reset by peer) [21:30] *** DiscantX has joined #archiveteam [21:39] *** bRick5772 has quit IRC (Quit: Leaving.) [21:49] *** Stiletto has joined #archiveteam [21:51] *** i336_ has joined #archiveteam [21:54] *** DiscantX has quit IRC (Ping timeout: 492 seconds) [22:05] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [22:13] *** nox_ has quit IRC (Ping timeout: 260 seconds) [22:15] *** Famicoma1 has quit IRC (Ping timeout: 260 seconds) [22:18] *** PepsiMax has joined #archiveteam [22:18] *** Aqua has joined #archiveteam [22:20] *** Aqua is now known as nox_ [22:22] *** Stiletto has quit IRC (Remote host closed the connection) [22:22] *** Stiletto has joined #archiveteam [22:22] *** Vix has joined #archiveteam [22:24] *** nox_ has quit IRC (Read error: Connection reset by peer) [22:25] *** nox_ has joined #archiveteam [22:26] How should I go about finding archived Twitch vods from a specific user, which are not listed under http://archive.fart.website/archiveteam/twitchtv-index/html/ ? [22:27] I already tried searching the wayback archive for his channel name and videos, but i only end up with VOD ID, not the direct link to the FLV [22:27] *** Stiletto has quit IRC (Ping timeout: 260 seconds) [22:27] I need the FLV, and can't think of any other way of finding it [22:28] *** Stiletto has joined #archiveteam [22:28] the vods I'm looking for are from may and june 2012 [22:28] I'll idle, maybe somebody will know how to help me.. [22:28] *** Start has quit IRC (Quit: Disconnected.) [22:35] *** sep332 has quit IRC (Quit: konversation out) [22:37] Was the user in the items list? A lot was not saved due to the massive size of VODs with little or no views. [22:38] It's Lirik [22:38] he's very popular [22:39] After twitch went from FLV to HLS some of his vods are missing or are corrupted [22:39] I'm trying to retrieve them [22:40] some vods are completely black, like this one: https://www.twitch.tv/lirik/v/38461920 [22:41] other vods (which were muted because of copyrighted music) are blacked out for 30 minutes in the part where song was playing [22:41] and a lot of vods from 2012 is just missing [22:47] I believe most channels did not have a complete grab done on them, only selected VODs with 100+ views. It was up to the members of those channels (and channel owner) to retrieve their content. [22:50] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [22:51] *** RichardG has joined #archiveteam [22:53] hmm [22:54] Is there a way to scan web archive for "*.justin.tv/archives/*" and happen to find those files anyways? [22:54] it doesn't let me use multiple asterisks though :P [22:57] *** kristian_ has joined #archiveteam [23:11] *** BlueMaxim has joined #archiveteam [23:23] *** Smiley has quit IRC (Ping timeout: 250 seconds) [23:24] *** Famicoma1 has joined #archiveteam [23:27] *** Smiley has joined #archiveteam [23:31] *** wp494_ has joined #archiveteam [23:32] *** wp494 has quit IRC (Ping timeout: 245 seconds) [23:33] I'm guessing there's no way unless I contact Twitch storage team [23:33] *** wp494_ is now known as wp494 [23:51] *** Stiletto has quit IRC (Read error: Operation timed out) [23:52] *** Stiletto has joined #archiveteam