[00:20] so i should be up to 2008-09-20 with funny or die videos now [00:22] btw i'm up to 790k items [00:22] Where do you find the time to do so much archiving? [00:24] i have autism and i'm unemployed [00:24] Ah, that would definitely give you time [00:25] that and most of it is scripting [00:26] no different then what Jason does when uploading 1000s of items [00:26] some stuff is more pickly like the funny or die videos [00:27] but thats cause description metadata doesn't always upload [00:27] *** DoomTay has joined #archiveteam-bs [00:28] i have to upload the missing files without descriptions so the at least get uploaded [00:28] keywords, title and date metadata is still there [00:30] Thanks for all the work you do, you may be the most productive "unemployed" person ever [00:32] ^ what he/she/they said. You do good work, and lots of it. [00:32] Heh, there have been a LOT of unemployed people who have done lots of valuable work. You stand in a long tradition. [00:33] ^ [01:09] *** nightpool has quit IRC (Read error: Operation timed out) [01:29] JesseW: hrm. it's a very interesting concept, but it's failing to provide any good examples of real-world implementations or what they'd look like [01:36] joepie91: what, WTFD? [01:39] well, I came across it from https://github.com/hyperboria/docs/blob/master/achievements.md so that may provide somehting of an example [01:46] *** RichardG has quit IRC (Ping timeout: 258 seconds) [01:50] *** anjacks0n has joined #archiveteam-bs [01:53] *** anjacks0n has quit IRC (Ping timeout: 190 seconds) [01:54] i'm starting to download the july 2016 of kpfa [01:55] got an easy one - emulation9.com - Archive.org grabs 403, browser sees 403, Google Cache sees it fine. at the very least could someone please find me a workaround to view it? web-based proxy or something? :) [01:55] i tried a user-agent changer in Firefox but it didn't do the trick [01:55] man I am rusty at these tricks [01:58] Stiletto: I can't read the language it's written in. What particular URL are you trying to get at, that isn't in Google Cache? [02:00] it's Japanese and IIRC they give error 403 to all non-Japan browsers. Google Cache sees it fine. so how can I view it normally and not in Google Cache? ;) [02:00] Stiletto: moment [02:00] thanks joepie91! :) [02:01] I think I might have a Japan-based VPS... [02:01] * joepie91 digs [02:02] interestingly, it fails using Google Translate, also [02:02] yes I do! [02:03] cool, thanks joepie91 [02:03] btw, joepie91 -- was your comment about "it's a very interesting concept" in reference to the essay I posted about WTFM, or something else? [02:03] the website isn't in danger or anything, it's been online since 1999 [02:04] -bash: less: command not found [02:04] ... k... [02:04] Stiletto: hm, I get a 403 from that VPS also [02:04] "it's been online since 1999" -- that alone seems like a sign of danger [02:04] perhaps it's just broken? [02:04] and Google Cache happens to have a good copy [02:04] JesseW: WTFM [02:05] the google cache copy is from Aug 1, 2016 01:09:45 GMT [02:05] well it's been dishing out 403 for years and years now. IIRC there was some drama in the emu-scene and the guy locked down the site somewhat [02:05] i still have it in my bookmarks tho and tonight i got to wondering about workarounds [02:05] We wouldn't happen to have anyone onboard who actually lives in Japan, do we? [02:05] fwiw, this is maxmind geolocation for my server: {"as":"AS20473 Choopa, LLC","city":"Heiwajima","country":"Japan","countryCode":"JP","isp":"Choopa, LLC","lat":35.5833,"lon":139.7483,"org":"Choopa, LLC","query":"108.61.200.70","region":"13","regionName":"Tokyo","status":"success","timezone":"Asia/Tokyo","zip":"143-0006"} [02:06] it might be locked to Japanese *ISPs* [02:06] (Choopa is very much not one :P) [02:06] and it's now 2:06 GMT, so I don't think it's broken. [02:07] If so, this would be the first time I've seen a site restricted by ISP [02:07] DoomTay: not for me :P [02:07] locked to Japanese ISPs and Google, presumably [02:07] DoomTay: Tunisian government actually did this during arab spring [02:07] hrm [02:07] hm [02:07] may be Egyptian [02:07] not Tunisian [02:07] can you go from one page to another *via* Google Cache? [02:07] can't recall which [02:07] could have been both [02:08] Stiletto: if you have a few $ and you consider it important enough, you could try some actually-Japanese VPS services [02:08] see also https://www.exoticvps.com/country/japan [02:09] some of those are not really Japanese though [02:10] Looking at the google cache page, I'm not seeing many internal links -- Stiletto, can you provide some examples? [02:12] *** nightpool has joined #archiveteam-bs [02:13] all the internal links are on the left side between the two graphics linking to amazon [02:13] ex. http://www.emulation9.com/emulators/a_mame.html [02:17] come to think of it it would be nice to have a legit backup in the archive, they've got 17 years of emulator news headlines. That's pretty rare these days to find a news site that's been going that long. Looks like Google Cache is thorough, but you can't browse from page to page within Google Cache (tho I think there's a Firefox plugin that can do that or something) [02:18] ex: http://webcache.googleusercontent.com/search?q=cache:25iDTYHPnpYJ:www.emulation9.com/archives/ [02:19] anyhow, its always been an annoyance of mine :) [02:19] sorry to bother you guys :) [02:21] hm google cache does not have some of those old headlines [02:26] Stiletto: try prefixing the URL with cache: in Google search [02:26] and wait a few days [02:26] chances are they will have it then [02:26] hm ok [02:26] maybe someone _should_ do a grab with an Japanese ISP for archive.org. I have no money for a VPS right now tho :-/ [02:26] (I'm fairly certain that Google obsessively collects URLs from all of their properties, and uses stuff like this to track demand for specific URLs) [02:26] (agreed) [03:22] *** ndiddy has quit IRC (Read error: Connection reset by peer) [03:22] *** beardicus has quit IRC (Read error: Operation timed out) [03:23] *** ndiddy has joined #archiveteam-bs [03:23] *** Coderjoe has quit IRC (Read error: Operation timed out) [03:24] *** remsen has quit IRC (Read error: Operation timed out) [03:24] *** remsen has joined #archiveteam-bs [03:26] *** beardicus has joined #archiveteam-bs [03:28] *** wp494 has quit IRC (Read error: Connection reset by peer) [03:33] *** Coderjoe has joined #archiveteam-bs [04:04] Stiletto: try changing your useragent to the google cache UA [04:05] *** RichardG has joined #archiveteam-bs [04:28] *** robink has quit IRC (Ping timeout: 246 seconds) [04:34] *** Aranje has joined #archiveteam-bs [04:38] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:40] *** robink has joined #archiveteam-bs [04:46] *** Sk1d has joined #archiveteam-bs [04:46] Is it immoral to download and archive a movie that is available to rent for a limited time that you think might be at risk of not being streamable or available on a DVD later on? [04:55] "We are not moral guardians." [04:59] random survey request I got -- hard to tell if it's spam or not: https://paste2.org/M1PJ3GFa [05:02] *** Asparagir has quit IRC (Asparagir) [05:08] *** ndiddy has quit IRC (Quit: Leaving) [05:13] JesseW: if you can't be a moral guardian then what's your opinion on it. :P [05:13] I keep on getting emails from "The Hindu Daily Digest" and I don't remember signing up for it. [05:14] *** JesseW has quit IRC (Ping timeout: 370 seconds) [05:30] *** DoomTay has quit IRC (Quit: Page closed) [05:34] with pokemon go being the craze i have this for you guys: https://archive.org/details/Becoming_A_Master_-_The_Ultimate_Pokemon_Experience_1999_VHSRip [05:34] i uploaded it awhile back [05:38] *** wp494 has joined #archiveteam-bs [05:56] *** nightpool has quit IRC (Read error: Operation timed out) [06:03] *** Aranje has quit IRC (Quit: Three sheets to the wind) [06:05] *** nightpool has joined #archiveteam-bs [06:12] *** nightpool has quit IRC (Read error: Operation timed out) [06:27] *** Coderjoe has quit IRC (Read error: Operation timed out) [06:28] *** REiN^ has joined #archiveteam-bs [06:32] godane: "FBI Warning" xD [06:35] *** schbirid has joined #archiveteam-bs [06:36] *** VADemon has joined #archiveteam-bs [06:41] *** kristian_ has joined #archiveteam-bs [06:44] *** Coderjoe has joined #archiveteam-bs [07:06] *** nightpool has joined #archiveteam-bs [07:13] *** nightpool has quit IRC (Ping timeout: 244 seconds) [07:20] over 2k items in my godanefunnyordie account [07:27] *** kristian_ has quit IRC (Leaving) [07:34] looks like bombjack scanned 3 more Boardwatch Magazines [07:35] Where [07:36] http://www.bombjack.org/commodore/magazines/boardwatch-magazine/boardwatch-magazine.htm [07:37] Exciting [07:38] i know [07:38] i hope they also get the misssing Byte Magazine issues from the 1990s scanned too [07:41] also this guy is uploading Boot Magazine : https://archive.org/details/@ckeck [07:42] he uploaded more on reddit: https://www.reddit.com/user/KnuckleSangwich [07:42] i have them all even he doesn't upload them all to IA [07:46] https://archive.org/search.php?query=collection%3Aopensource+mediatype%3Atexts+magazine is always a fun search. [08:02] *** Coderjoe has quit IRC (Read error: Operation timed out) [08:03] *** Coderjoe has joined #archiveteam-bs [08:09] *** nightpool has joined #archiveteam-bs [08:13] *** nightpool has quit IRC (Ping timeout: 250 seconds) [09:11] *** tomwsmf has quit IRC (Read error: Operation timed out) [09:11] *** tomwsmf has joined #archiveteam-bs [09:24] *** BlueMaxim has quit IRC (Read error: Operation timed out) [09:25] *** nightpool has joined #archiveteam-bs [09:30] *** nightpool has quit IRC (Ping timeout: 250 seconds) [09:31] *** BlueMaxim has joined #archiveteam-bs [09:31] *** tomwsmf has quit IRC (Ping timeout: 258 seconds) [09:48] *** REiN^ has quit IRC () [09:53] *** signius has quit IRC (Ping timeout: 260 seconds) [10:05] *** signius has joined #archiveteam-bs [10:14] *** REiN^ has joined #archiveteam-bs [10:26] *** nightpool has joined #archiveteam-bs [10:31] *** nightpool has quit IRC (Ping timeout: 260 seconds) [11:26] *** BlueMaxim has quit IRC (Quit: Leaving) [11:27] *** nightpool has joined #archiveteam-bs [11:35] *** nightpool has quit IRC (Read error: Operation timed out) [12:28] *** nightpool has joined #archiveteam-bs [12:37] *** nightpool has quit IRC (Read error: Operation timed out) [12:52] *** nightpool has joined #archiveteam-bs [12:57] *** nightpool has quit IRC (Ping timeout: 244 seconds) [13:48] *** Coderjoe has quit IRC (Read error: Operation timed out) [13:53] *** RichardG has quit IRC (Ping timeout: 633 seconds) [13:55] *** vitzli has joined #archiveteam-bs [13:59] *** nightpool has joined #archiveteam-bs [14:03] *** JSharp___ has quit IRC (Read error: Connection reset by peer) [14:03] *** Ctrl-S___ has quit IRC (Write error: Connection reset by peer) [14:04] *** Boltsie has quit IRC (Read error: Connection reset by peer) [14:04] *** sigkell has quit IRC (Ping timeout: 260 seconds) [14:04] *** antonizoo has quit IRC (Ping timeout: 260 seconds) [14:04] *** HCross2 has quit IRC (Read error: Connection reset by peer) [14:04] *** johtso has quit IRC (Read error: Connection reset by peer) [14:04] *** nightpool has quit IRC (Read error: Operation timed out) [14:06] *** Coderjoe has joined #archiveteam-bs [14:09] *** antonizoo has joined #archiveteam-bs [14:12] *** JSharp___ has joined #archiveteam-bs [14:12] *** deathy has quit IRC (Connection closed) [14:13] hook54321: I tried a few Googlebot useragents but they didn't work. Not sure if the google-cache useragent is known? this blog post from 2012 didn't seem to think so: http://www.coconutheadphones.com/google-crawling-behavior/ [14:13] *** Ctrl-S___ has joined #archiveteam-bs [14:15] *** Boltsie has joined #archiveteam-bs [14:16] *** sigkell has joined #archiveteam-bs [14:36] *** HCross2 has joined #archiveteam-bs [14:37] *** johtso has joined #archiveteam-bs [14:40] *** VADemon has quit IRC (Quit: left4dead) [14:47] *** RichardG has joined #archiveteam-bs [14:47] *** Coderjoe has quit IRC (Read error: Operation timed out) [14:49] *** metalcamp has joined #archiveteam-bs [15:26] *** Coderjoe has joined #archiveteam-bs [15:31] *** nightpool has joined #archiveteam-bs [15:35] *** JesseW has joined #archiveteam-bs [15:52] *** Coderjoe_ has joined #archiveteam-bs [15:52] *** Coderjoe has quit IRC (Read error: Operation timed out) [15:57] *** DoomTay has joined #archiveteam-bs [16:06] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:39] *** anjacks0n has joined #archiveteam-bs [16:47] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [16:51] *** nightpool has quit IRC (Read error: Connection reset by peer) [16:51] *** metalcamp has quit IRC (Quit: Bye) [16:57] *** sep332 has quit IRC (Read error: Connection reset by peer) [17:08] *** nightpool has joined #archiveteam-bs [17:19] *** ndiddy has joined #archiveteam-bs [17:35] *** Honno has joined #archiveteam-bs [17:47] *** sep332 has joined #archiveteam-bs [17:49] *** Honno_ has joined #archiveteam-bs [17:51] *** Honno__ has joined #archiveteam-bs [17:59] *** Honno has quit IRC (Read error: Operation timed out) [18:03] *** Honno_ has quit IRC (Read error: Operation timed out) [18:37] *** whydomain has joined #archiveteam-bs [18:39] *** vitzli has quit IRC (Leaving) [18:50] *** Coderjoe_ has quit IRC (Read error: Operation timed out) [19:01] If someone were to setup grab-site with a Facebook cookie and then just had it start archiving, could it potentially start sending friend requests to tons of people? I'm not planning on doing this, i just want to know if that or something similar could happen. [19:02] not unless Facebook does friend requests via GETs [19:02] which AFAIK they don't [19:02] wpull/grab-site follows links, it won't click buttons unless you script it to do otherwise [19:06] How can I tell if a website uses GETs? [19:07] you can investigate HTTP traffic using the web inspector and from there get a gradually better picture of what request verbs are being sent [19:07] all that said, I don't think Facebook would do something like that [19:08] I'm not archiving anything on Facebook at the moment, it's a different site. [19:13] *** sanquiAFK has quit IRC (Ping timeout: 260 seconds) [19:14] *** DoomTay has quit IRC (Quit: Page closed) [19:14] *** Coderjoe has joined #archiveteam-bs [19:16] *** Honno__ has quit IRC (Read error: Operation timed out) [19:21] *** Honno__ has joined #archiveteam-bs [19:22] *** tomwsmf has joined #archiveteam-bs [19:39] *** Sanqui has joined #archiveteam-bs [19:43] looks like IA index is not updating [19:50] anyways i'm uploading kpfa for 2016-07 [19:57] *** DoomTay has joined #archiveteam-bs [20:05] *** Honno__ has quit IRC (Read error: Operation timed out) [20:07] ok now index is updating [20:25] *** schbirid has quit IRC (Quit: Leaving) [20:39] *** Coderjoe has quit IRC (Ping timeout: 260 seconds) [20:48] *** Coderjoe has joined #archiveteam-bs [20:53] *** RichardG has joined #archiveteam-bs [20:56] *** zhongfu has quit IRC (Remote host closed the connection) [20:58] *** zhongfu has joined #archiveteam-bs [21:00] *** DiscantX has joined #archiveteam-bs [21:01] *** RichardG has quit IRC (Ping timeout: 258 seconds) [21:02] *** RichardG has joined #archiveteam-bs [21:07] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [21:45] [23:40] Just an FYI for Defcon goers.... they are using Biometrics for the room locks at bailys and paris. you can opt out. [21:57] *** whydomain has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) [21:59] *** REiN^ has quit IRC (Read error: Operation timed out) [22:04] def congoers [22:04] i kinda like the idea of not having to carry around your key [22:05] i don't like the idea of giving fingerprints to the mob [22:10] *** DoomTay has quit IRC (Quit: Page closed) [22:48] *** DoomTay has joined #archiveteam-bs [23:01] *** nightpool has quit IRC (Read error: Operation timed out) [23:02] *** BlueMaxim has joined #archiveteam-bs [23:30] *** DoomTay has quit IRC (Quit: Page closed) [23:31] ha [23:34] *** RichardG has joined #archiveteam-bs [23:47] SketchCow: http://www.bombjack.org/commodore/disks/?C=M;O=D [23:47] alot of disks released in the last month it looks like