#archiveteam-bs 2013-12-24,Tue

↑back Search

Time Nickname Message
03:42 🔗 Famicoman http://www.geocities.com/spunk1111/xmas.htm
04:21 🔗 joepie91 hmm?
04:22 🔗 BiggieJon someone brought back the geocities domain ?
04:23 🔗 joepie91 haha wtf
04:24 🔗 joepie91 BiggieJon, Famicoman, http://betabeat.com/2012/06/10-bizarre-geocities-pages-that-still-exist/
04:25 🔗 BiggieJon HUH ??
04:26 🔗 BiggieJon so yahoo has left certian select pages up ??
04:26 🔗 joepie91 yeah, idk, I'm confused
04:26 🔗 joepie91 lol
04:27 🔗 BiggieJon had enough to drink already I'm not motovated enough to start digging too much
04:27 🔗 BiggieJon for now I'm just slightly amused
05:41 🔗 xmc BiggieJon, joepie91, Famicoman: yahoo can't deactivate those sites without affecting their paid site hosting system
05:41 🔗 joepie91 lol
05:42 🔗 xmc it's some weird thing
05:42 🔗 DFJustin yeah the working sites correspond to people who have domain names hosted through yahoo
05:42 🔗 DFJustin we should really go through and scrape them all at some point
05:48 🔗 yipdw we eh
08:06 🔗 m1das im just going to say it. Yahoo is a strange company.
09:01 🔗 joepie91 m1das: you have successfully completed your introductory Archive Team training!
09:01 🔗 joepie91 :P
09:02 🔗 joepie91 if Archive Team were a comic, Yahoo would be the recurring archvillain
09:02 🔗 joepie91 and Yahoo would be slightly megalomaniac, strange, and very unpredictable
09:03 🔗 m1das and for some reason when everything fails it would change it's face.
09:35 🔗 godane i found a way to grab all google research papers
09:36 🔗 godane this is the url type: static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/ja/pubs/archive/99.pdf
10:23 🔗 arkiver godane: do you know how I can create a collection on the internet archive for items I have uploaded?
10:26 🔗 godane just upload them to archiveteam collection
10:26 🔗 godane or texts collection
10:26 🔗 godane SketchCow can move them to where they need to be
10:42 🔗 arkiver godane: I'm currently uploadin all the torrents from http://vodo.net/ to the IA as items
10:42 🔗 arkiver those films are all legal
10:43 🔗 arkiver and I wanted to know if I could get a subcollection named "VODO" with all the VODO films from the website that I've uploaded
10:43 🔗 godane uploaded them as movies
10:44 🔗 arkiver ?
10:44 🔗 godane and put vodo as a keyword
10:44 🔗 arkiver you already uploaded them?
10:44 🔗 godane i mean you uploaded them as movies
10:44 🔗 arkiver ah
10:44 🔗 arkiver wait
10:44 🔗 arkiver I'll give you an example item
10:44 🔗 arkiver https://archive.org/details/VODO198HoldMeLikeYouUsedTo
10:44 🔗 godane i never seen a torrent turn into a movie though
10:45 🔗 arkiver https://archive.org/details/VODO188TalesFromOssian
10:45 🔗 godane so i didn't know
10:45 🔗 arkiver https://archive.org/details/VODO177DeliveredInBeta
10:45 🔗 arkiver those are some examples
10:45 🔗 arkiver yes
10:45 🔗 arkiver I can upload the torrent to the IA
10:45 🔗 arkiver and the IA downloads the torrent files
10:45 🔗 arkiver and then the files are converted
10:45 🔗 arkiver so I think that's the OK way to do it
15:12 🔗 BlueMax merry christmas from Australia
15:14 🔗 DFJustin arkiver: once you've uploaded a bunch you can e-mail info@archive.org to have them put into a collection and set as movie items
15:14 🔗 DFJustin or SketchCow can do it for you
15:15 🔗 arkiver DFJustin: Ah, yes, I just sended SketchCow an email
15:15 🔗 arkiver :)
15:15 🔗 arkiver I'm also finding some more websites with only legal torrents
15:15 🔗 arkiver I'm thinking about doing all the torrents from those websites too
15:15 🔗 arkiver so they will be saved
15:15 🔗 arkiver and since they are legal there should not be a too big problem in it I think
15:16 🔗 DFJustin nope that sounds great actually
15:18 🔗 DFJustin uploading torrent files is definitely the most convenient way to upload, I find a lot of sites put things in .rar archives and the like though so it's often necessary to download and extract the files before uploading
15:18 🔗 DFJustin in order for IA's streaming and such to work
15:27 🔗 norbert79 Merry Christmas from Hungary all!
15:28 🔗 DFJustin szervusz
15:28 🔗 norbert79 :)
15:28 🔗 norbert79 hello
15:28 🔗 DFJustin that's about the only word of hungarian I know :D
15:28 🔗 norbert79 I know this might be a bit early for all in the US
15:28 🔗 norbert79 Well, that's enough :) I think it's meaning is self explanatory :)
15:29 🔗 arkiver Merry Christmas norbert79!!
15:29 🔗 arkiver :)
15:30 🔗 arkiver DFJustin: ok, I will see how many torrent I can do... :)
15:30 🔗 arkiver I'm going to start uploading some big torrent sites also (carrying only legal torrents)
15:31 🔗 arkiver It just sucks that we can't do the "illegal" torrents... :(
15:32 🔗 DFJustin depends, some sites they have it set up to grab and just not make the items public
15:39 🔗 godane some more microsoft research presentation are getting uploaded now
15:39 🔗 arkiver do you have the links?
15:39 🔗 arkiver :)
15:39 🔗 arkiver also
15:39 🔗 arkiver just FYI (that people won't do the same thing) I'm downloading technet
15:39 🔗 arkiver already got 350 GB
15:40 🔗 arkiver technet.microsoft.com
15:41 🔗 BiggieJon the forums ? or iso's or MS software ?
15:41 🔗 godane i'm downloading archive.linuxgizmos.com
15:41 🔗 godane it has all the old linuxdevices.com articles
15:41 🔗 arkiver great godane! :) thank you
15:41 🔗 arkiver BiggieJon: the whole websites
15:42 🔗 arkiver including the forums and everything
15:42 🔗 godane https://archive.org/details/msrvideo.vo.msecnd.net-pdf-grab-168001-to-178000
15:42 🔗 BiggieJon arkiver: I dont thin microsoft likes copies of their OS posted on otehr sites
15:42 🔗 godane the zip is not uploaded yet
15:42 🔗 godane will be soon
15:43 🔗 arkiver BiggieJon: I see what you mean, no I'm not downloading with a 100$ per year account
15:43 🔗 arkiver but with no account
15:43 🔗 arkiver and technet.microsoft.com will go down...,
15:43 🔗 BiggieJon all the free stuff is moving to a new site
15:44 🔗 BiggieJon I'm one of those subscribers tht is getting royally screwed
15:45 🔗 arkiver BiggieJon: the biggest part of my download size of that website (of the 350 GB) is going to these kind of pages:
15:45 🔗 arkiver the videos from those pages:
15:45 🔗 arkiver http://content4.catalog.video.msn.com/e2/ds/a5c9f8a6-5618-41af-a05b-c184b92274bb.wmv
15:45 🔗 arkiver http://content1.catalog.video.msn.com/e2/ds/alt-en-us/ALTENUS_TECHNET/ALTENUS_TECHNET_EDGE/fa09b864-af5c-435f-993c-ac75be8e3336.mp4
15:45 🔗 arkiver http://download.microsoft.com/download/D/0/9/D092A40A-7A3B-4AC4-BBEF-A316F4EA8FED/HDI_ITPro_Technet_winvideo_Eron_Kelly_update_on_Office_365.zip
15:45 🔗 arkiver and so on
15:47 🔗 arkiver 419 GB now
15:47 🔗 BiggieJon ahh, ok, thats public
15:47 🔗 BiggieJon I would avoid any licensed software or even demo versions
15:47 🔗 arkiver will probably be around 7-10 TB total
15:47 🔗 arkiver yes
15:48 🔗 arkiver I'm only downloading the things people without an account can also view on the website for free
15:51 🔗 arkiver I have a small question
15:52 🔗 arkiver when I'm uploading torrents to the archive from legal torrent websites for archival
15:52 🔗 arkiver shall I add the description from the website too?
15:52 🔗 arkiver or not?
15:52 🔗 arkiver since adding a description too is costing a lot of more time when doing thousands of torrents
16:03 🔗 BiggieJon guess I should probably figure out what to do about my windows licenses . . .
16:03 🔗 BiggieJon thinking now might be a great time to switch to linux :)
16:06 🔗 BlueMax I'd be on Linux right now if it had decent gaming support
16:07 🔗 BiggieJon I kinda need to keep at least 1 windows box for some work stuff
16:07 🔗 godane i just wrapped some gifts
16:08 🔗 BiggieJon BlueMax: hopefully steam will help with that
16:08 🔗 BlueMax yeah totally
16:09 🔗 joepie91 BlueMax: Linux has fine "gaming support", games and proprietary graphics drivers just don't have decent Linux support :)
16:09 🔗 SketchCow arkiver: I want someone to review what method you're using to archive sites.
16:09 🔗 arkiver ah
16:09 🔗 arkiver haha
16:10 🔗 arkiver SketchCow: I'm using heritrix
16:10 🔗 arkiver I'm then creating a torrent from a pack
16:10 🔗 arkiver then uploading that torrent to the IA
16:10 🔗 SketchCow Why.
16:10 🔗 arkiver the IA downloads the pack through that torrent
16:10 🔗 SketchCow Why are you doing that.
16:10 🔗 arkiver well
16:10 🔗 arkiver if I'm uploading 100GB and my internet fails...
16:10 🔗 joepie91 everything needed to develop/run/etc. games on Linux, is basically there
16:10 🔗 arkiver 100GB gone then
16:10 🔗 SketchCow Why are you not using Archiverbot.
16:10 🔗 arkiver ah
16:11 🔗 arkiver it can't do 400+ GB websites right?
16:11 🔗 BlueMax archivebot doesn't have enough HDD space for a 100GB upload to my knowledge
16:11 🔗 godane its too big
16:11 🔗 arkiver yes
16:11 🔗 SketchCow https://catalogd.archive.org/log/279810495
16:11 🔗 SketchCow These aren't 400gb websites.
16:11 🔗 SketchCow These are endless small websites.
16:11 🔗 arkiver ah yes
16:11 🔗 arkiver hehe
16:12 🔗 SketchCow Look, don't hehe me.
16:12 🔗 arkiver those were just some small webistes
16:12 🔗 arkiver wanted to see if it would dferive good
16:12 🔗 arkiver derive*
16:12 🔗 SketchCow I think you are fundimentally making mistakes here.
16:12 🔗 SketchCow I want to get to the bottom of it, because you're uploading 100s of gigs and I am not convinced the wayback machine knows what to do with it.
16:13 🔗 SketchCow I'd like yipdw or others to verify the work
16:13 🔗 arkiver ah yes
16:13 🔗 SketchCow Otherwise, you are doing the exact nightmare scenario archivebot is meant to fix.
16:13 🔗 SketchCow The EXACT one.
16:14 🔗 arkiver with the nightmare scenario you mean archiving small websites and taking a lot of time for them right?
16:14 🔗 SketchCow No.
16:14 🔗 godane i'm grabbing a website on my own too
16:14 🔗 SketchCow I mean your stuff never ending up in the wayback machine.
16:14 🔗 godane but its a warc.gz file
16:14 🔗 arkiver SketchCow: I does
16:14 🔗 SketchCow godane: 1. You do them right, and 2. You also tend to either curate or use archivebot.
16:15 🔗 arkiver wait i'll give an example of the other pack
16:15 🔗 SketchCow I want the work checked. I don't want your opinion on this.
16:16 🔗 arkiver Ok, how can I help then?
16:16 🔗 arkiver I told you my exact archiving way
16:16 🔗 SketchCow Stop downloading, stop uploading.
16:16 🔗 SketchCow And wait.
16:17 🔗 arkiver Wat for when it is verified that it works?
16:17 🔗 arkiver wait*
16:17 🔗 SketchCow Yes.
16:17 🔗 SketchCow Find someone to help verify if you are action oriented.
16:18 🔗 arkiver so I just stop all the other uploads I'm doing now?
16:18 🔗 SketchCow I'd prefer it, yes.
16:19 🔗 arkiver Stopped.
16:20 🔗 godane i thought the vodo archive would be ok
16:21 🔗 arkiver I'm not totally sure, what you mean. You mean this: I'm uploading my files, but you are not sure if they end up good in the wayback machine or not? So you want me to stop the uploads for now and let the warc.gz files be checked by someone who knows how to check them. When they are checked and verified to be working correctly, I can resume what I'm doing?
16:21 🔗 SketchCow I don't like what you're doing, fundamentally, but more pressing is I think you've uploaded potentially a terabyte of bad WARCs.
16:22 🔗 SketchCow I want to be proven wrong, hence I am calling for someone to help check the work.
16:22 🔗 arkiver And why do you think the warc's are bad?
16:22 🔗 SketchCow Meanwhile I am setting type to web, and initiating a re-derive to see what the archive does.
16:22 🔗 SketchCow Because they're not showing in wayback.
16:22 🔗 SketchCow Or identified.
16:22 🔗 arkiver they do show up
16:22 🔗 arkiver the first back showed up after a few minutes
16:22 🔗 arkiver I just see it takes some time for them to show up once they are moved to the archiveteam section.
16:23 🔗 arkiver and I think heritrix will make good warc's... Heritrix is created by the IA as far as I know
16:25 🔗 arkiver but ok, I will just cancel everything and start doing something else if you don't like what I'm doing
16:25 🔗 arkiver that's ok then
16:25 🔗 SketchCow https://catalogd.archive.org/log/279810495 finished. I want someone to check this work.
16:33 🔗 godane now this is funny
16:33 🔗 arkiver ?
16:33 🔗 godane so theblaze blurred some faces of some kids here: http://www.theblaze.com/wp-content/uploads/2013/12/oshkosh-man-at-walmart.jpeg
16:34 🔗 DFJustin there are some weird issues around releasing images of minors
16:34 🔗 godane there front page has a image of without blurred faces: http://www.theblaze.com/wp-content/uploads/2013/12/sign-641x375.jpg
21:08 🔗 arkhive what the hell is #rathole
21:08 🔗 arkhive i've been invited twice

irclogger-viewer