[00:25] *** mistym has quit IRC (Remote host closed the connection) [00:28] *** oldcad has quit IRC (Quit: Leaving.) [00:45] *** dashcloud has quit IRC (Read error: Operation timed out) [00:48] *** dashcloud has joined #archiveteam-bs [00:55] *** mistym has joined #archiveteam-bs [01:16] *** dashcloud has quit IRC (Read error: Operation timed out) [01:19] *** dashcloud has joined #archiveteam-bs [01:43] I just mounted a revolution at work to force IA to move to Slack [01:43] It... is going on interestingly. [01:43] Anyone need me for anything? [01:43] I know this whole thing with Sourceforge [01:43] What is Slack? (Besides the goal of all right-thinking SubGenuis) [01:44] Check out slack.com, they walk you through it [01:44] *** schbirid2 has quit IRC (Read error: Operation timed out) [01:44] ah, an IM app. [01:44] ish [01:45] hm, might be worth poking my work about it, too. We're currently (not particuarly happily) using Google Hangouts. [01:49] How's the archiving with Slack? [01:57] SketchCow: i'm uploading more tagesschau 20 clock evening news from 1989 [01:57] *** schbirid2 has joined #archiveteam-bs [01:58] also 1989 set will be complete as it can be [01:58] they only started in September 1989 [02:03] Great [02:06] *** primus104 has quit IRC (Leaving.) [02:28] *** JesseW1 has quit IRC (Quit: Leaving.) [02:50] *** dashcloud has quit IRC (Read error: Operation timed out) [02:51] slack is ... hm. i used to be cranky-old-man about it and then i used it [02:52] they replace irc servers and webchat, but let you use your irc client against it [02:52] and their revenue model is "it is free, if you have a retention policy and so want to delete history, then insert money" [02:52] which is pretty well targeted, i think [02:53] not sure how much that will cover their expenses, but i wish them the best [03:02] *** dashcloud has joined #archiveteam-bs [03:09] *** bzc6p_ has joined #archiveteam-bs [03:10] *** Start has quit IRC (Read error: Connection reset by peer) [03:11] *** Start has joined #archiveteam-bs [03:11] *** mistym has quit IRC (Remote host closed the connection) [03:15] *** bzc6p has quit IRC (Ping timeout: 600 seconds) [03:28] *** mistym has joined #archiveteam-bs [03:40] *** JesseW has joined #archiveteam-bs [03:42] i found an article that used the apple logo from the wiki and even credited us: http://www.hallels.com/articles/9738/20141016/apple-event-16-october-latest-news-updates-what-expect-os.htm [03:44] *** mistym has quit IRC (Remote host closed the connection) [03:45] *** mistym has joined #archiveteam-bs [04:04] *** JesseW has quit IRC (Quit: Leaving.) [04:13] *** vitzli has joined #archiveteam-bs [04:26] *** aaaaaaaaa has quit IRC (Leaving) [04:42] *** mistym has quit IRC (Remote host closed the connection) [04:43] *** mistym has joined #archiveteam-bs [04:51] 0/r [04:52] Start: that's kinda weird [04:52] i didn't know we were hosting apple's press kit :P [04:52] *** dashcloud has quit IRC (Read error: Operation timed out) [05:01] *** dashcloud has joined #archiveteam-bs [06:00] *** RichardG has quit IRC (Ping timeout: 370 seconds) [06:13] haha [06:36] *** RichardG has joined #archiveteam-bs [06:38] *** mistym has quit IRC (Remote host closed the connection) [07:05] *** bzc6p_ is now known as bzc6p [07:18] *** JesseW has joined #archiveteam-bs [07:32] python guys: is it reasonable for a 10 threaded program that fetches from a JSON api and inserts the data into a DB to use 700 megs of ram? [07:39] *** mistym has joined #archiveteam-bs [07:43] *** Boltsie has quit IRC (Ping timeout: 506 seconds) [07:48] *** mistym has quit IRC (Read error: Operation timed out) [07:57] *** JesseW has quit IRC (Leaving.) [07:59] *** Fusl has quit IRC (Read error: Operation timed out) [08:04] *** primus104 has joined #archiveteam-bs [08:33] *** Fusl has joined #archiveteam-bs [08:42] *** Fusl has quit IRC (Read error: Operation timed out) [08:49] Ctrl-S: unfortunately it can be, what db layer and python interpreter are you using? (sqlite3, pymysql, mysqldb library running on standard cpython, pypy?) [08:49] um [08:49] python 2.7 sqlalchemy postgres [08:53] that's really large for something like that, never used sqlalchemy but I wouldn't expect 700 megs [08:55] maybe partly multithreading messing up? this line in the docs stood out: "The Session object is entirely designed to be used in a non-concurrent fashion, which in terms of multithreading means “only in one thread at a time”.", but I haven't used sqlalchemy much [08:58] I believe i'm keeping each session local to one thread [08:58] other than that, if you really wanna nail it down, you can get the current memory size with something like this: http://stackoverflow.com/questions/938733/total-memory-used-by-python-process#answer-7669482 [08:58] thanks [08:59] I've had luck in the past with getting memory size, running 5k queries, getting memory size again, etc [08:59] changing what you do between those two grabs of the memory size can sometimes help nail down leaks and stuff like that [09:00] I think my problem was SQLAlchemy's unit of work thingy [09:00] Since i added a call to it's flush function to force it to push data to the DB, it seems to be behaving [09:02] ah, that's good [09:02] sitting at under 70 now [09:03] *** primus104 has quit IRC (Leaving.) [09:03] whoo, much better [09:03] now i just have to see if it's like this in a days time [09:04] databases are fun, had an application a while ago that would just go down a few times a week, stop talking to the database at all and actually segfault cpython sometimes [09:04] half of my optimisation seems to be based on hunches [09:04] only ever happened in production though [09:05] turned out the db layer we were using didn't understand threading at all, and was munching all over itself >_> [09:05] I still need to add WARC output to my code [09:05] also i just realised that my API keys would be included in the WARC file [09:06] since they're URL parameters [09:06] \o/ [09:06] winner winner chicken dinner [09:07] storytime xmc? [09:11] bedtime [10:01] *** Muad-Dib has quit IRC (Ping timeout: 252 seconds) [10:15] *** vitzli has quit IRC (Quit: Leaving) [10:37] *** vitzli has joined #archiveteam-bs [10:43] *** mistym has joined #archiveteam-bs [10:44] *** primus104 has joined #archiveteam-bs [10:49] *** mistym has quit IRC (Read error: Operation timed out) [11:44] *** mistym has joined #archiveteam-bs [11:45] *** Muad-Dib has joined #archiveteam-bs [11:52] *** mistym has quit IRC (Read error: Operation timed out) [12:08] *** primus104 has quit IRC (Leaving.) [12:38] *** Fusl has joined #archiveteam-bs [13:30] i'm looking at mirroring metro.co.uk [13:45] *** mistym has joined #archiveteam-bs [13:46] i'm up to feb 1990 of tagesschau 20:00 evening news [13:53] *** Boppen has quit IRC (Ping timeout: 198 seconds) [13:54] *** mistym has quit IRC (Ping timeout: 512 seconds) [14:31] *** mistym has joined #archiveteam-bs [14:39] *** mistym has quit IRC (Remote host closed the connection) [14:55] *** mistym has joined #archiveteam-bs [14:59] *** BlueMaxim has quit IRC (Quit: Leaving) [15:14] *** primus104 has joined #archiveteam-bs [15:19] *** Start has quit IRC (Read error: Connection reset by peer) [15:20] *** Start has joined #archiveteam-bs [15:22] *** primus104 has quit IRC (Leaving.) [15:32] *** vitzli has quit IRC (Quit: Leaving) [15:36] *** bzc6p_ has joined #archiveteam-bs [15:41] *** zenguy_pc has quit IRC (Read error: Connection reset by peer) [15:42] *** bzc6p has quit IRC (Ping timeout: 600 seconds) [15:47] *** mistym has quit IRC (Remote host closed the connection) [15:57] *** zenguy_pc has joined #archiveteam-bs [15:59] *** JesseW has joined #archiveteam-bs [16:15] *** Boppen has joined #archiveteam-bs [16:21] *** JesseW has quit IRC (Quit: Leaving.) [16:32] *** mistym has joined #archiveteam-bs [17:02] *** aaaaaaaaa has joined #archiveteam-bs [17:15] *** primus104 has joined #archiveteam-bs [17:25] i'm uploading march 1990 of tagesschau 20:00 evening news [17:30] godane: awesome, where did you get them? [17:31] godane: you have misspelled the language: https://archive.org/details/tagesschau-20-clock-evening-news-1990-02-27 [17:31] "germen" needs to be "german" [17:33] Hey so i have a question on how to do something [17:33] There is this site that is constantly updated, and on occasion things will get bumped back a page [17:34] *** diacope has quit IRC (Ping timeout: 252 seconds) [17:34] I want to archive it, but how can I be sure that a post wont have moved across pages since i archived the page? [17:34] http://www.tagesschau.de/inland/tsvorzwanzigjahren126~_origin-08770167-e3c2-45f7-a0b4-001139c0bbce.html [17:42] *** diacope has joined #archiveteam-bs [17:56] *** mistym has quit IRC (Read error: Connection reset by peer) [17:56] *** mistym_ has joined #archiveteam-bs [18:27] *** bzc6p_ is now known as bzc6p [18:27] Nertsy: archive it from front to rear [18:27] That is, from newer posts to older. Doing so, worst case is you archive a post twice. [18:29] bzc6p, makes sense... Unfortunately it just went down [18:56] *** logchfoo1 starts logging #archiveteam-bs at Fri Jun 26 18:56:35 2015 [18:56] *** logchfoo1 has joined #archiveteam-bs [19:19] *** goekesmi has quit IRC (Remote host closed the connection) [19:23] *** goekesmi has joined #archiveteam-bs [19:44] *** mistym has quit IRC (Remote host closed the connection) [20:11] *** RichardG has quit IRC (Remote host closed the connection) [20:12] *** RichardG has joined #archiveteam-bs [20:15] *** mistym has joined #archiveteam-bs [20:27] is that comic at the bottom already known https://github.com/gilesbowkett/rewind ? :) [20:38] *** mistym_ has joined #archiveteam-bs [20:38] *** mistym has quit IRC (Read error: Connection reset by peer) [20:45] *** godane has quit IRC (Ping timeout: 370 seconds) [21:14] *** godane has joined #archiveteam-bs [21:35] .title https://www.techdirt.com/articles/20150625/10561131460/canada-saves-public-public-domain-extends-copyright-sound-recordings-another-20-years.shtml [21:35] Start: Canada Saves Public From Public Domain, Extends Copyright On Sound Recordings Another 20 Years | Techdirt [21:37] "Now songs such as Buffy Sainte-Marie’s "Universal Soldier" -- released 50 years ago this August -- are no longer in danger of entering the public domain." [21:37] "no longer in danger" [21:37] are you fucking kidding me [21:45] ^ [21:47] *** RichardG has quit IRC (Ping timeout: 252 seconds) [21:48] Millions could have died. [21:49] *** Asparagir has joined #archiveteam-bs [21:51] Start: sarcasm [21:51] *** RichardG has joined #archiveteam-bs [21:51] oh wow, it is not [21:59] Won't somebody please think of the children!? [22:01] http://i.imgur.com/yfbjCeP.jpeg [22:05] I have a unfunny story about a girlfriend, a floppy disk and corrupted images which did exactly that.. [22:11] something something https://xkcd.com/598/ [22:12] rofl [22:17] *** Asparagir has quit IRC (Quit: Leaving) [22:28] SketchCow: i'm grabbing Gamespot podcast called The HotSpot [23:11] I THINK we have it, double check [23:23] *** BlueMaxim has joined #archiveteam-bs [23:41] SketchCow: i just looked and we don't have it: https://archive.org/search.php?query=The%20HotSpot%20gamespot [23:42] maybe its in some web archive or tar ball some where but there is not collection for [23:42] *for it [23:43] *** dashcloud has quit IRC (Read error: Operation timed out) [23:44] *** dashcloud has joined #archiveteam-bs