[00:04] *** Coderjoe has quit IRC (Read error: Connection reset by peer) [00:33] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [00:39] *** Coderjoe has joined #archiveteam [00:40] *** BartoCH has joined #archiveteam [00:47] *** j08nY has quit IRC (Quit: Leaving) [00:53] *** BlueMaxim has joined #archiveteam [01:44] Fixing up bitsavers [01:46] *** ris has quit IRC () [01:48] (Making it more automatically sort things into subgroups) [01:49] For a group that fucking hates me, I sure get them a lot of traffic [02:10] *** GLaDOS has quit IRC (Read error: Operation timed out) [02:10] *** GLaDOS has joined #archiveteam [02:18] i'm up to 725k items [02:30] *** VADemon has quit IRC (Quit: left4dead) [02:39] *** GLaDOS has quit IRC (Quit: Oh crap, I died.) [02:40] *** GLaDOS has joined #archiveteam [04:32] *** WinterFox has joined #archiveteam [04:58] *** tomwsmf-a has quit IRC (Ping timeout: 258 seconds) [04:58] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:05] *** Sk1d has joined #archiveteam [06:03] *** Aranje has quit IRC (Ping timeout: 260 seconds) [07:34] *** DoomTay has quit IRC (Quit: Page closed) [08:51] *** RedType has joined #archiveteam [10:47] *** ris has joined #archiveteam [10:49] *** j08nY has joined #archiveteam [11:48] *** ris has quit IRC () [12:49] *** VADemon has joined #archiveteam [12:50] *** ris has joined #archiveteam [13:07] *** BlueMaxim has quit IRC (Quit: Leaving) [13:16] *** WinterFox has quit IRC (Remote host closed the connection) [13:25] *** kcaj has quit IRC (Quit: ZNC - 1.6.0 - http://znc.in) [13:27] *** kcaj has joined #archiveteam [13:42] *** REiN^ has joined #archiveteam [13:51] *** ris has quit IRC () [13:52] Does anyone have an idea for a nice new webarchiving project? [13:52] NewsBuddy is now running pretty well. VideoBot needs support for some more websites. [13:54] NewsBuddy could always do with more grabbers [13:55] Yahoo Answers. [13:58] hahaha [13:59] Arkiver, are you stumbling around, bloodied, yelling "WHO'S NEXT" [13:59] I'm happy to give you some examples of things people think we're fucked about [14:00] arkiver, what about a thing where someone can feed it a hashtag and it will go away and archive all the tweets with that hashtag? [14:01] It wouldn't suprise me if yahoo answers does suddenly disappear [14:01] eh.. too much noise, too few actual usage. How about some ISP Hosting. http://archiveteam.org/index.php?title=ISP_Hosting [14:02] as I mentioned before, there is a chanche, that the arcor webhosting is going away by end of the year. Google shows 800k hits. [14:02] *chance [14:43] SketchCow: Yes! [14:43] What are the examples? [14:43] HCross: ok, that will be in VideoBot [14:44] PurpleSym: didn't you already save yahoo answers? [14:46] The entirety of YouTube. [14:48] Nope. [14:52] arkiver: YouTube metabase. Crawl all public/hidden videos by IDs and save their metadata. Public ones should be done like a regular webcrawler, hidden ones are perfect for the current URLTeam infrastructure [14:53] There're more videos deleted every day than we notice, often with 0 traces left after deletion [14:54] Interesting idea [14:54] SketchCow, what do you think of that? [14:58] all adorable. [14:58] We need an export function for Facebook. [14:59] We need the ability to save social media streams for anything, as a part, and push it into Archivebot. [14:59] We need to be able to expand Archivebot as easy as we can. [14:59] And IA.BAK needs a windows-runnable client [14:59] You are able to download all your data from Facebook [15:06] No. [15:07] No you really can't. [15:11] Is adorable good in Jason-speak? :p [15:16] I always like seeing people roll up their sleeves. [15:16] Strong, firm youth, sweating away in the midday sun [15:16] * SketchCow relaxes on the porch with a mint julep [15:20] Oh right... :S [15:21] I spent last week clearing out a huge backlog of the most pressing things, and pushing godane's contributions into the right subcollections. [15:21] This week, more of same, especially documentaries and floppy disks. [15:51] SketchCow: export function for facebook as in WARC file, or more user friendly zipping up everything [15:51] ? [15:51] For coursera we'll need people who have a fast connection with FOS [16:08] Do we need a European FOS? [16:19] *** Tomcat_ has joined #archiveteam [16:25] *** ndizzle has joined #archiveteam [16:28] I'm pretty sure what SketchCow is implying for facebook is that there needs to be a tool that lets you grab a given facebook page easily, completely, and is always updated (pretty much like a youtube-dl for facebook pages) [16:29] *** xXx_ndidd has quit IRC (Ping timeout: 244 seconds) [16:31] *** ris has joined #archiveteam [16:34] PurpleSym: not sure, if someone here has good speed to FOS and lots of bandwidth, that should be enough [16:34] if not we need an alternative [16:38] There might be [16:38] (For a facebook) [16:48] *** ndiddy has joined #archiveteam [16:51] *** ndizzle has quit IRC (Ping timeout: 244 seconds) [17:36] *** schbirid has joined #archiveteam [18:11] *** nertzy has joined #archiveteam [18:26] *** DoomTay has joined #archiveteam [18:38] *** xXx_ndidd has joined #archiveteam [18:41] *** DoomTay has quit IRC (Quit: Page closed) [18:41] *** ndiddy has quit IRC (Ping timeout: 244 seconds) [19:17] Testing forums grabs for coursera now [19:17] If that works correctly we're ready to start the project [19:17] it should also all be playable in the wayback machine [19:17] at least it is webarchiveplayer [19:25] Last week was "clear out godane inbox" [19:25] This week is "clear out FOS of everything not a backup, maybe clear those backups too" [19:25] With 6.5 terabytes of goodness, there's a lot to clear out. [19:34] SketchCow: do WARCs that go into megawarcs need special names? [19:34] As in, is it enough to have them in the dir of the project on FOS? [19:35] Or is their name also checked before adding them to the megawarc [19:35] and it looks like something is wrong with the reporting of archivebot items in http://fos.textfiles.com/ARCHIVETEAM/ [19:37] Yeah, it's weird, huh [19:37] I will re-look at it to see why it reports so much. [19:38] I don't know the intricacies of warcs in megawarc sadly. [19:38] Needs a test. [19:42] One sight change of whatever: People really need to stop handing me entire runs of tv and cable series through the FTP sites. [19:43] These are just insane things to do through me. Do them directly or something [19:53] *** Kenshin has quit IRC (Remote host closed the connection) [19:58] *** Rondom has quit IRC (Remote host closed the connection) [19:58] *** Rondom has joined #archiveteam [19:59] nasa docs are up to 1964 [20:02] NewsGrabber has now uploaded around 85 TB to Internet Archive [20:02] The number of views is growing very fast. [20:07] *** Tomcat_ has quit IRC (Remote host closed the connection) [20:11] *** schbirid has quit IRC (Quit: Leaving) [20:16] Some tings get pretty amazing hits. [20:16] All the hiphop mixtapes are getting millions of hits [20:18] Millions! [20:30] *** DriverDan has joined #archiveteam [20:32] *** DriverDan has quit IRC (Client Quit) [20:38] *** fie__ has quit IRC (Remote host closed the connection) [20:38] *** fie__ has joined #archiveteam [21:00] *** Kenshin has joined #archiveteam [21:53] *** tomwsmf-a has joined #archiveteam [21:59] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [22:06] *** BartoCH has joined #archiveteam [22:50] *** WinterFox has joined #archiveteam [23:04] *** j08nY has quit IRC (Quit: Leaving) [23:20] *** WinterFox has quit IRC (Read error: Operation timed out) [23:32] *** jrwr has joined #archiveteam