[03:42] http://www.geocities.com/spunk1111/xmas.htm [04:21] hmm? [04:22] someone brought back the geocities domain ? [04:23] haha wtf [04:24] BiggieJon, Famicoman, http://betabeat.com/2012/06/10-bizarre-geocities-pages-that-still-exist/ [04:25] HUH ?? [04:26] so yahoo has left certian select pages up ?? [04:26] yeah, idk, I'm confused [04:26] lol [04:27] had enough to drink already I'm not motovated enough to start digging too much [04:27] for now I'm just slightly amused [05:41] BiggieJon, joepie91, Famicoman: yahoo can't deactivate those sites without affecting their paid site hosting system [05:41] lol [05:42] it's some weird thing [05:42] yeah the working sites correspond to people who have domain names hosted through yahoo [05:42] we should really go through and scrape them all at some point [05:48] we eh [08:06] im just going to say it. Yahoo is a strange company. [09:01] m1das: you have successfully completed your introductory Archive Team training! [09:01] :P [09:02] if Archive Team were a comic, Yahoo would be the recurring archvillain [09:02] and Yahoo would be slightly megalomaniac, strange, and very unpredictable [09:03] and for some reason when everything fails it would change it's face. [09:35] i found a way to grab all google research papers [09:36] this is the url type: static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/ja/pubs/archive/99.pdf [10:23] godane: do you know how I can create a collection on the internet archive for items I have uploaded? [10:26] just upload them to archiveteam collection [10:26] or texts collection [10:26] SketchCow can move them to where they need to be [10:42] godane: I'm currently uploadin all the torrents from http://vodo.net/ to the IA as items [10:42] those films are all legal [10:43] and I wanted to know if I could get a subcollection named "VODO" with all the VODO films from the website that I've uploaded [10:43] uploaded them as movies [10:44] ? [10:44] and put vodo as a keyword [10:44] you already uploaded them? [10:44] i mean you uploaded them as movies [10:44] ah [10:44] wait [10:44] I'll give you an example item [10:44] https://archive.org/details/VODO198HoldMeLikeYouUsedTo [10:44] i never seen a torrent turn into a movie though [10:45] https://archive.org/details/VODO188TalesFromOssian [10:45] so i didn't know [10:45] https://archive.org/details/VODO177DeliveredInBeta [10:45] those are some examples [10:45] yes [10:45] I can upload the torrent to the IA [10:45] and the IA downloads the torrent files [10:45] and then the files are converted [10:45] so I think that's the OK way to do it [15:12] merry christmas from Australia [15:14] arkiver: once you've uploaded a bunch you can e-mail info@archive.org to have them put into a collection and set as movie items [15:14] or SketchCow can do it for you [15:15] DFJustin: Ah, yes, I just sended SketchCow an email [15:15] :) [15:15] I'm also finding some more websites with only legal torrents [15:15] I'm thinking about doing all the torrents from those websites too [15:15] so they will be saved [15:15] and since they are legal there should not be a too big problem in it I think [15:16] nope that sounds great actually [15:18] uploading torrent files is definitely the most convenient way to upload, I find a lot of sites put things in .rar archives and the like though so it's often necessary to download and extract the files before uploading [15:18] in order for IA's streaming and such to work [15:27] Merry Christmas from Hungary all! [15:28] szervusz [15:28] :) [15:28] hello [15:28] that's about the only word of hungarian I know :D [15:28] I know this might be a bit early for all in the US [15:28] Well, that's enough :) I think it's meaning is self explanatory :) [15:29] Merry Christmas norbert79!! [15:29] :) [15:30] DFJustin: ok, I will see how many torrent I can do... :) [15:30] I'm going to start uploading some big torrent sites also (carrying only legal torrents) [15:31] It just sucks that we can't do the "illegal" torrents... :( [15:32] depends, some sites they have it set up to grab and just not make the items public [15:39] some more microsoft research presentation are getting uploaded now [15:39] do you have the links? [15:39] :) [15:39] also [15:39] just FYI (that people won't do the same thing) I'm downloading technet [15:39] already got 350 GB [15:40] technet.microsoft.com [15:41] the forums ? or iso's or MS software ? [15:41] i'm downloading archive.linuxgizmos.com [15:41] it has all the old linuxdevices.com articles [15:41] great godane! :) thank you [15:41] BiggieJon: the whole websites [15:42] including the forums and everything [15:42] https://archive.org/details/msrvideo.vo.msecnd.net-pdf-grab-168001-to-178000 [15:42] arkiver: I dont thin microsoft likes copies of their OS posted on otehr sites [15:42] the zip is not uploaded yet [15:42] will be soon [15:43] BiggieJon: I see what you mean, no I'm not downloading with a 100$ per year account [15:43] but with no account [15:43] and technet.microsoft.com will go down..., [15:43] all the free stuff is moving to a new site [15:44] I'm one of those subscribers tht is getting royally screwed [15:45] BiggieJon: the biggest part of my download size of that website (of the 350 GB) is going to these kind of pages: [15:45] the videos from those pages: [15:45] http://content4.catalog.video.msn.com/e2/ds/a5c9f8a6-5618-41af-a05b-c184b92274bb.wmv [15:45] http://content1.catalog.video.msn.com/e2/ds/alt-en-us/ALTENUS_TECHNET/ALTENUS_TECHNET_EDGE/fa09b864-af5c-435f-993c-ac75be8e3336.mp4 [15:45] http://download.microsoft.com/download/D/0/9/D092A40A-7A3B-4AC4-BBEF-A316F4EA8FED/HDI_ITPro_Technet_winvideo_Eron_Kelly_update_on_Office_365.zip [15:45] and so on [15:47] 419 GB now [15:47] ahh, ok, thats public [15:47] I would avoid any licensed software or even demo versions [15:47] will probably be around 7-10 TB total [15:47] yes [15:48] I'm only downloading the things people without an account can also view on the website for free [15:51] I have a small question [15:52] when I'm uploading torrents to the archive from legal torrent websites for archival [15:52] shall I add the description from the website too? [15:52] or not? [15:52] since adding a description too is costing a lot of more time when doing thousands of torrents [16:03] guess I should probably figure out what to do about my windows licenses . . . [16:03] thinking now might be a great time to switch to linux :) [16:06] I'd be on Linux right now if it had decent gaming support [16:07] I kinda need to keep at least 1 windows box for some work stuff [16:07] i just wrapped some gifts [16:08] BlueMax: hopefully steam will help with that [16:08] yeah totally [16:09] BlueMax: Linux has fine "gaming support", games and proprietary graphics drivers just don't have decent Linux support :) [16:09] arkiver: I want someone to review what method you're using to archive sites. [16:09] ah [16:09] haha [16:10] SketchCow: I'm using heritrix [16:10] I'm then creating a torrent from a pack [16:10] then uploading that torrent to the IA [16:10] Why. [16:10] the IA downloads the pack through that torrent [16:10] Why are you doing that. [16:10] well [16:10] if I'm uploading 100GB and my internet fails... [16:10] everything needed to develop/run/etc. games on Linux, is basically there [16:10] 100GB gone then [16:10] Why are you not using Archiverbot. [16:10] ah [16:11] it can't do 400+ GB websites right? [16:11] archivebot doesn't have enough HDD space for a 100GB upload to my knowledge [16:11] its too big [16:11] yes [16:11] https://catalogd.archive.org/log/279810495 [16:11] These aren't 400gb websites. [16:11] These are endless small websites. [16:11] ah yes [16:11] hehe [16:12] Look, don't hehe me. [16:12] those were just some small webistes [16:12] wanted to see if it would dferive good [16:12] derive* [16:12] I think you are fundimentally making mistakes here. [16:12] I want to get to the bottom of it, because you're uploading 100s of gigs and I am not convinced the wayback machine knows what to do with it. [16:13] I'd like yipdw or others to verify the work [16:13] ah yes [16:13] Otherwise, you are doing the exact nightmare scenario archivebot is meant to fix. [16:13] The EXACT one. [16:14] with the nightmare scenario you mean archiving small websites and taking a lot of time for them right? [16:14] No. [16:14] i'm grabbing a website on my own too [16:14] I mean your stuff never ending up in the wayback machine. [16:14] but its a warc.gz file [16:14] SketchCow: I does [16:14] godane: 1. You do them right, and 2. You also tend to either curate or use archivebot. [16:15] wait i'll give an example of the other pack [16:15] I want the work checked. I don't want your opinion on this. [16:16] Ok, how can I help then? [16:16] I told you my exact archiving way [16:16] Stop downloading, stop uploading. [16:16] And wait. [16:17] Wat for when it is verified that it works? [16:17] wait* [16:17] Yes. [16:17] Find someone to help verify if you are action oriented. [16:18] so I just stop all the other uploads I'm doing now? [16:18] I'd prefer it, yes. [16:19] Stopped. [16:20] i thought the vodo archive would be ok [16:21] I'm not totally sure, what you mean. You mean this: I'm uploading my files, but you are not sure if they end up good in the wayback machine or not? So you want me to stop the uploads for now and let the warc.gz files be checked by someone who knows how to check them. When they are checked and verified to be working correctly, I can resume what I'm doing? [16:21] I don't like what you're doing, fundamentally, but more pressing is I think you've uploaded potentially a terabyte of bad WARCs. [16:22] I want to be proven wrong, hence I am calling for someone to help check the work. [16:22] And why do you think the warc's are bad? [16:22] Meanwhile I am setting type to web, and initiating a re-derive to see what the archive does. [16:22] Because they're not showing in wayback. [16:22] Or identified. [16:22] they do show up [16:22] the first back showed up after a few minutes [16:22] I just see it takes some time for them to show up once they are moved to the archiveteam section. [16:23] and I think heritrix will make good warc's... Heritrix is created by the IA as far as I know [16:25] but ok, I will just cancel everything and start doing something else if you don't like what I'm doing [16:25] that's ok then [16:25] https://catalogd.archive.org/log/279810495 finished. I want someone to check this work. [16:33] now this is funny [16:33] ? [16:33] so theblaze blurred some faces of some kids here: http://www.theblaze.com/wp-content/uploads/2013/12/oshkosh-man-at-walmart.jpeg [16:34] there are some weird issues around releasing images of minors [16:34] there front page has a image of without blurred faces: http://www.theblaze.com/wp-content/uploads/2013/12/sign-641x375.jpg [21:08] what the hell is #rathole [21:08] i've been invited twice