[00:02] *** dashcloud has quit IRC (Read error: Operation timed out) [00:06] *** dashcloud has joined #archiveteam-bs [00:20] *** ralphdnak has quit IRC (Ping timeout: 633 seconds) [00:21] *** ralphdnak has joined #archiveteam-bs [00:31] *** kristian_ has joined #archiveteam-bs [00:39] *** ralphdnak has quit IRC (Ping timeout: 244 seconds) [01:10] Interesting article on video game preservation: https://web.stanford.edu/group/htgg/cgi-bin/drupal/?q=node/1211 [01:19] *** bauruine has joined #archiveteam-bs [01:33] *** kristian_ has quit IRC (Leaving) [01:36] *** Eloquence has joined #archiveteam-bs [01:40] *** r3c0d3x_ has joined #archiveteam-bs [01:41] Hey, can I speak to a staff member about something? (My internet connection has been acting up the past few days and has probably spammed this chat with joins/leaves.) [01:44] r3c0d3x_: we noticed. :-) [01:44] I'm not sure who's around right now, but someone probably will speak up eventually. [01:58] K, thanks. Really sorry about all that, my ISP has been having problems the past few days (and they're still ongoing), but I'm currently on a seperate, stable sever now, so that should no longer be an issue. [02:07] Eh, it happens. [02:07] I can hardly complain, as I don't even run a bouncer, so I pop in and out a lot. [02:22] *** Stilett0 has joined #archiveteam-bs [02:22] *** Stiletto has quit IRC (Read error: Operation timed out) [02:39] *** Stilett0 has quit IRC (Read error: Operation timed out) [02:39] *** Stiletto has joined #archiveteam-bs [02:54] i couldn't think of a better place for that, so if anyone has any suggestions.. [02:56] seems good [02:57] that's a really interesting tool, a big part of what's needed if you ask me [02:58] The list of currently supported sites didn't happen to be any of personal interest to me -- but the idea certainly seems good and useful. [02:58] reading through their site now, this does look really interesting! nice find bwn. [02:59] might contribute at some point [02:59] Nemo bis found it and added it to the Quora wiki [02:59] * JesseW is happily reading through http://wiki.erights.org/wiki/Walnut/Distributed_Computing right now [03:00] jessew: i happen to have a quora account but zero answers, none of the others either, heh [03:01] the extensible aspect though [03:14] !ig 2lnjehj9rvargx2kpdcxcxzx5 ^https?://www\.drudgereportarchives\.com/data/.*_video-gunshots-shouts-allahu-akbar-french-magazine-shooting_823281\.html [03:23] hm, I need to look at the extensible aspect more, I guess [03:29] https://freeyourstuff.cc/plugins <- bwn, I presume you meant this page? [03:30] *** dashcloud has quit IRC (Read error: Operation timed out) [03:32] probably a good idea to take Erik up on this: https://freeyourstuff.cc/mirrors [03:34] *** dashcloud has joined #archiveteam-bs [03:36] *** BlueMaxim has quit IRC (Read error: Operation timed out) [03:38] *** BlueMaxim has joined #archiveteam-bs [03:44] it might be interesting to write a generalized mediawiki plugin for freeyourstuff.cc [03:58] sorry, yes, the plugins is what i was referring to [04:11] *** DopefishJ is now known as DFJustin [04:16] *** VADemon has quit IRC (Read error: Connection reset by peer) [04:27] *** Stiletto has quit IRC (Read error: Operation timed out) [04:27] *** Stiletto has joined #archiveteam-bs [04:51] *** Stiletto has quit IRC (Read error: Operation timed out) [04:51] *** Stiletto has joined #archiveteam-bs [04:56] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [04:56] *** BlueMaxim has joined #archiveteam-bs [05:23] *** Stiletto has quit IRC (Read error: Operation timed out) [05:24] *** Stiletto has joined #archiveteam-bs [05:48] *** Stiletto has quit IRC (Read error: Operation timed out) [05:48] *** Stiletto has joined #archiveteam-bs [06:10] *** Stiletto has quit IRC (Read error: Operation timed out) [06:10] *** Stiletto has joined #archiveteam-bs [06:15] *** Honno has joined #archiveteam-bs [06:55] *** Stiletto has quit IRC (Read error: Operation timed out) [06:55] *** Stiletto has joined #archiveteam-bs [06:56] *** JesseW has quit IRC (Read error: Operation timed out) [07:00] *** Eloquence has quit IRC (Ping timeout: 244 seconds) [07:03] *** ralphdnak has joined #archiveteam-bs [07:09] *** PurpleSym sets mode: -b r3c0d3x!*@* [07:11] r3c0d3x_: ^ [07:22] *** bwn has quit IRC (Ping timeout: 244 seconds) [07:24] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [07:25] *** Aranje has quit IRC (Remote host closed the connection) [07:30] *** bwn has joined #archiveteam-bs [07:39] *** BlueMaxim has quit IRC (Quit: Leaving) [07:45] *** bzc6p has joined #archiveteam-bs [07:45] *** swebb sets mode: +o bzc6p [08:01] *** xXx_ndidd has joined #archiveteam-bs [08:03] *** ralphdnak has quit IRC (Ping timeout: 244 seconds) [08:06] *** ndiddy has quit IRC (Read error: Operation timed out) [08:15] *** bzc6p has left [08:22] PurpleSym: cheers, just woke up [08:37] *** Eloquence has joined #archiveteam-bs [08:56] *** Stiletto has quit IRC (Read error: Operation timed out) [08:56] *** Stiletto has joined #archiveteam-bs [09:02] *** Eloquence has quit IRC (Read error: Operation timed out) [09:19] *** closure has quit IRC (Ping timeout: 250 seconds) [09:19] *** closure has joined #archiveteam-bs [09:19] *** midas sets mode: +o closure [09:41] *** Stiletto has quit IRC (Read error: Operation timed out) [09:41] *** Stiletto has joined #archiveteam-bs [09:43] *** ralphdnak has joined #archiveteam-bs [10:38] *** ralphdnak has quit IRC (Ping timeout: 244 seconds) [10:51] *** Stiletto has quit IRC (Read error: Operation timed out) [10:51] *** Stiletto has joined #archiveteam-bs [11:15] *** Stiletto has quit IRC (Read error: Operation timed out) [11:15] *** Stiletto has joined #archiveteam-bs [11:42] *** Stiletto has quit IRC (Read error: Operation timed out) [11:42] *** Stiletto has joined #archiveteam-bs [12:09] *** VADemon has joined #archiveteam-bs [12:24] *** Honno has quit IRC (Read error: Operation timed out) [12:42] PSA: Facebook forces users to download their new app "Moments" in order to NOT LOSE (auto-)synced photos [12:42] https://twitter.com/aurevoiralexis/status/740728442254135296/photo/1 [13:10] *** Stiletto has quit IRC (Read error: Operation timed out) [13:10] *** Stiletto has joined #archiveteam-bs [13:45] *** Stiletto has quit IRC (Read error: Operation timed out) [13:45] *** Stiletto has joined #archiveteam-bs [15:02] PurpleSym: Thanks! Everything should be fixed now. [15:02] *** r3c0d3x_ is now known as r3c0d3x [15:07] *** Honno has joined #archiveteam-bs [15:23] *** Honno has quit IRC (Read error: Operation timed out) [15:42] *** Stiletto has quit IRC (Read error: Operation timed out) [15:42] *** Stiletto has joined #archiveteam-bs [15:59] if there's any projects/ideas/whatever that would benefit from unfiltered, super-high speed connections for a few days, make a list- HOPE is this summer, and you'll have access to a great network for the weekend (July 22-24) [16:02] *** GLaDOS has quit IRC (Quit: Oh crap, I died.) [16:03] *** GLaDOS has joined #archiveteam-bs [16:11] *** Rotab has quit IRC (Read error: Connection reset by peer) [16:23] "To make a long story short, we managed to find the company that had purchased our valve manufacturer and it turns out they had exited the manufacturing buisness and they were now a magazine. However, they still had a warehouse full of the fucking valves, and they'd sell us one if we wanted it. And that was the day we ordered an expensive three way valve from a company that had no idea how it worked, or what it did." [16:23] ( https://www.reddit.com/r/talesfromtechsupport/comments/4njv3r/our_operators_are_too_stupid_part_1/d44plu1 ) [16:31] *** JesseW has joined #archiveteam-bs [16:32] joepie91: the company in that story seriously sounds like Roche Pharmaceutials [16:33] they are Very Big and they have a strong presence in Indiana, which is basically Nowhere, USA [16:35] *** schbirid has joined #archiveteam-bs [16:40] another g4tv.com video saved: https://archive.org/details/g4tv.com-video36368-flvhd [17:14] *** fie has joined #archiveteam-bs [18:03] https://publicpolicy.googleblog.com/2016/06/the-trans-pacific-partnership-step.html what. [18:07] *** _desu___ has joined #archiveteam-bs [18:07] *** HCross2 has joined #archiveteam-bs [18:08] Hi HCross2! [18:09] Hello [18:09] Did you see my list of academictorrents I posted yesterday? Will that work for you, or would you like me to parse it further? [18:10] Seems IRCCloud is on various sorts of fire. I had a look and ideally I want a set of torrent files [18:10] Deluge can't take a set of magnets [18:11] hey [18:11] Can I help with a script to back them up to IA? [18:11] arkiver: certainly! [18:11] Just backing up the torrents to IA, let IA download them [18:11] Yep, that's the basic plan. [18:11] yeah, if we can just get the torrents, we can feed them into the IA and they will get them [18:11] ok [18:11] From the infohashes, you should be able to download the torrents like this, I think: [18:12] yes [18:12] http://academictorrents.com/download/403e6d6945a64dd1b9e185a6cd8d029274efccdc.torrent [18:12] do we already have a list of hashes/torrents? [18:13] I made a list of 296 infohashes [18:13] http://termbin.com/j6f9 [18:13] ok [18:14] It looks like that list is incomplete [18:15] That's just a list of datasets -- the other items are papers, I think. [18:16] I mean, see the last line of that list [18:17] hm, yeah [18:17] I'm not sure what happened there :-( [18:17] I'll see about fixing that. [18:17] But I can add some scraping of the site to the script. [18:18] *** Aranje has joined #archiveteam-bs [18:18] sure. My scraping was as simple as downloading http://academictorrents.com/browse.php?cat=6&sort_field=seeders&sort_dir=DESC&page=0 and running a regex on the result [18:18] this was the regex: re.findall(r"""href="/details/([0-9a-z]+)">([^<]+).+?filelist=1">([0-9]+)<.+?([-0-9]+)<.+?>([0-9.]+[A-Z]+)<.+?center>([0-9,]+)<.+?dllist=1">([0-9+]+)<.+?""",txt, re.DOTALL) [18:18] There are 15 pages of datasets, and 55 pages of papers. [18:18] currently [18:19] with 20 items on each page [18:19] That will get you the infohashes, titles, sizes, file counts, "mirror" (i.e. seed) counts [18:20] http://academictorrents.com/about.php#mirroring could be relevant. [18:22] ok [18:22] I'll try to have something in a bit [18:22] PurpleSym: not particularly -- we *want* to do "blind mirroring of all data", so their per-collection lists don't help that much. :-) [18:23] arkiver: I don't think it's particular urgent, but it's a good thing to do. [18:23] Yeah, right below that section are details on their API. [18:23] That might be easier than screen scraping. [18:27] PurpleSym: I looked into it, but the API, unlike the real interface, didn't seem to support paging, oddly. [18:27] The examples suggest you can use &limit=9999 [18:27] And changing the limit seemed to require an API key -- which, no thanks, I'll just use what you are *already making available* [18:28] I see. [18:37] Anyway, curl -s -b 'uid=4510;pass=f2e3f605ea9062c5eb7390a3bd3f8eb9' 'http://academictorrents.com/apiv2/entries?limit=9999' | jq -r '.[] | [.infohash, .name, .size, .dateadded] | @csv' [18:39] Nice! [18:39] better you than me. [19:08] *** xioustic has joined #archiveteam-bs [19:22] *** ndizzle has joined #archiveteam-bs [19:26] *** JesseW has quit IRC (Read error: Operation timed out) [19:26] *** xXx_ndidd has quit IRC (Ping timeout: 244 seconds) [19:37] *** Eloquence has joined #archiveteam-bs [19:39] *** dashcloud has quit IRC (Read error: Operation timed out) [19:41] *** Start has quit IRC (Read error: Connection reset by peer) [19:41] *** Start has joined #archiveteam-bs [19:43] *** dashcloud has joined #archiveteam-bs [19:44] *** tomwsmf-a has joined #archiveteam-bs [20:01] *** Eloquence has quit IRC (Read error: Operation timed out) [20:14] I asked SketchCow to create a collection [20:36] *** Simpbrain has quit IRC (Read error: Operation timed out) [20:37] *** Eloquence has joined #archiveteam-bs [20:51] *** Stiletto has quit IRC (Read error: Operation timed out) [20:51] *** Stiletto has joined #archiveteam-bs [20:52] *** Simpbrain has joined #archiveteam-bs [21:03] *** schbirid has quit IRC (Quit: Leaving) [21:04] *** Simpbrain has quit IRC (Ping timeout: 633 seconds) [21:05] *** RichardG has joined #archiveteam-bs [21:07] *** Eloquence has quit IRC (Read error: Operation timed out) [21:07] *** Simpbrain has joined #archiveteam-bs [21:32] *** Simpbra1 has joined #archiveteam-bs [21:33] *** Simpbrain has quit IRC (Ping timeout: 1208 seconds) [21:38] *** kristian_ has joined #archiveteam-bs [21:43] *** JesseW has joined #archiveteam-bs [21:45] *** dashcloud has quit IRC (Read error: Operation timed out) [21:48] *** dashcloud has joined #archiveteam-bs [21:59] *** dashcloud has quit IRC (Read error: Connection reset by peer) [21:59] *** dashcloud has joined #archiveteam-bs [22:01] *** signius has quit IRC (Remote host closed the connection) [22:13] *** kristian_ has quit IRC (Leaving) [22:14] *** Eloquence has joined #archiveteam-bs [22:19] *** Stiletto has quit IRC (Read error: Operation timed out) [22:19] *** Stiletto has joined #archiveteam-bs [22:26] so apparently Savant's soundcloud was baleeted over a bunch of remixes [22:26] https://www.facebook.com/zyonMGMT/videos/vb.649866465116005/682443285191656/?type=2&theater [22:42] *** BlueMaxim has joined #archiveteam-bs [22:51] *** dashcloud has quit IRC (Read error: Operation timed out) [22:52] *** mutoso has quit IRC (Read error: Operation timed out) [22:54] so i have uploaded up to 2015-04 with kotaku.com [22:54] *** dashcloud has joined #archiveteam-bs [23:31] looks like i got all of gawker.com up to 2015: https://archive.org/search.php?query=subject%3A%22gawker.com%22 [23:51] *** Honno has joined #archiveteam-bs [23:52] *** Stiletto has quit IRC (Read error: Operation timed out) [23:52] *** Stiletto has joined #archiveteam-bs [23:53] so looks like i did lifehacker.com sitemap grab last summer [23:54] *** Eloquence has quit IRC (Read error: Operation timed out) [23:54] i will have to at least another 17 months of it so we are sure of up to date with it [23:57] good [23:57] valleywag also seems important to try and get, if we haven't already