[00:03] *** pizzaiolo has quit IRC (pizzaiolo) [00:09] i'm starting to upload more funny or die videos [00:13] That reminds me of something [00:13] Memes have become a huge part of internet culture, but there is no team dedicated to it [00:13] I think 4chan is quite dedicated to memes. [00:13] ;-) [00:14] Sorry, the hacker called 4chan* [00:14] Wait, who is this Four chan? [00:18] On a serious note, what would that team do? Maybe we should make a better effort at archiving communities like 4chan and Reddit and archive meme databases like KYM, but other than that, I don't see how a specific team for that is needed. And the 4chan/Reddit thing would obviously cover much more than just memes. A general "social media" team would probably make more sense. [00:19] A team dedicated to all of social media instead of memes would be much more efficient [00:21] The problem with social media is that they're fucking massive. Jason said the other day that IA is not prepared to archive all of Twitter (now that the Library of Congress stopped doing so), and that's just one network of many (though a large one, obviously). [00:24] *** BlueMaxim has joined #archiveteam-bs [00:50] How big is twitter's main feed? [00:55] Looks like text-only is maybe a gig a day? [00:55] No way. [00:55] 100 gigs, bad math [00:56] Yeah, that seems closer, but that'd be just the actual texts, no metadata etc. [00:56] Mayb 2-3x with metadata, yeah [00:56] That's reasonably hefty [00:57] github is growing at similar rates [00:57] around the TB/day OOM [00:58] I'll only be able to do a one-off pass [00:58] Yeah, grabbing newly published tweets is probably feasible. The historical tweets, however... [00:58] And then there's images and videos. [00:58] No idea how many of those there are. [01:04] are you guys having storage problems? [01:04] I have a fancy unlimited school google drive account to abuse [01:05] I've already thrown multiple terabytes at it [01:06] Yeah, that's a brilliant solution if you don't care about how long your data will be around. [01:14] If I wanted my stuff to be around as long as someone else wanted to keep it up, I'd keep it where it is [01:14] Please do keep archiving to it, but not what I want [01:16] Sure, it's better than not doing anything at all. [01:20] *** dashcloud has quit IRC (Ping timeout: 492 seconds) [01:23] *** dashcloud has joined #archiveteam-bs [01:46] *** Arctic has joined #archiveteam-bs [01:46] join #archiveteam-ot [02:34] *** Stilett0 has joined #archiveteam-bs [03:38] *** ld1 has quit IRC (Quit: ld1) [03:41] *** ld1 has joined #archiveteam-bs [04:25] *** fireglow has quit IRC (Quit: Gnothi seauton; Veritas vos liberabit) [04:25] *** Pixi has quit IRC (Quit: Pixi) [04:25] *** Pixi has joined #archiveteam-bs [04:30] *** fireglow has joined #archiveteam-bs [04:54] *** qw3rty117 has joined #archiveteam-bs [04:56] Could do what regular archivists do and sample or otherwise do pointed, strategic grabs, like core samples. All of Twitter, but only two randomly selected days each month, maybe 8TB. Still enables good research afterwards, even if it's only representative and not comprehensive. [04:58] For GitHub, maybe one effort to grab all the repos updated on a given day, another to find and grab the oldest and most unmaintained repos as one-time grabs. [04:58] SketchCow: i'm starting to capture your tapes to sent me last week [04:58] first one is the IBM tape [05:00] *** qw3rty116 has quit IRC (Read error: Operation timed out) [05:24] Hello. [05:31] *** Arctic has quit IRC (Quit: Page closed) [05:54] Hello, goodbye [06:23] *** robink has joined #archiveteam-bs [06:28] *** Mateon1 has quit IRC (Ping timeout: 255 seconds) [06:28] *** Mateon1 has joined #archiveteam-bs [07:36] *** Mateon1 has quit IRC (Read error: Connection reset by peer) [07:36] *** Mateon1 has joined #archiveteam-bs [07:44] *** schbirid has joined #archiveteam-bs [08:31] *** zino has quit IRC (Read error: Operation timed out) [08:36] *** pizzaiolo has joined #archiveteam-bs [08:40] so one of the tapes had Unsolved Mysteries episode but was taped over to put mtv music videos and 20 years of monty python [08:41] i only have a problem cause there a parts of unsolved mysteries but no complete segments [09:37] *** zino has joined #archiveteam-bs [09:56] *** Stilett0 has quit IRC (Read error: Operation timed out) [10:07] *** Stilett0 has joined #archiveteam-bs [10:07] *** Stiletto has joined #archiveteam-bs [10:45] *** Darkstar has quit IRC (Ping timeout: 246 seconds) [10:51] *** atrocity has quit IRC (Read error: Operation timed out) [11:02] *** Darkstar has joined #archiveteam-bs [11:39] *** Stilett0 has quit IRC () [11:42] *** BlueMaxim has quit IRC (Leaving) [12:21] *** schbirid has quit IRC (Quit: Leaving) [13:11] *** pizzaiolo has quit IRC (Read error: Operation timed out) [13:14] *** pizzaiolo has joined #archiveteam-bs [13:56] Ivan: look @ github, grab-site. [13:57] added description of installation in centos7 [14:07] *** RichardG_ has joined #archiveteam-bs [14:09] *** atrocity has joined #archiveteam-bs [14:09] *** RichardG has quit IRC (Ping timeout: 264 seconds) [14:12] *** RichardG_ is now known as RichardG [14:44] *** schbirid has joined #archiveteam-bs [15:14] *** ranavalon has quit IRC (Read error: Connection reset by peer) [15:16] *** RichardG_ has joined #archiveteam-bs [15:16] *** ranavalon has joined #archiveteam-bs [15:17] *** ranavalon has quit IRC (Remote host closed the connection) [15:17] *** RichardG has quit IRC (Read error: Connection reset by peer) [15:17] *** ranavalon has joined #archiveteam-bs [16:35] *** Arctic has joined #archiveteam-bs [16:45] *** Arctic has quit IRC (Quit: Page closed) [16:46] SketchCow: so on of the news files is going be called MTV Music Videos taped Over Unsolved Mysteries 198x.mpg or something [16:47] thats cause there are clips of Unsolved Mysteries that are still on the tape but nothing worth saving as a separate file [16:49] Sounds right [17:02] i was sort of said when i noticed mtv music videos was taped over a unsolved mysteries episode [17:03] i was at least hoping to get a complete block of it [17:24] so tape is now edited [17:25] i couldn't figure out the station it aired on though [17:26] also there is a 2 clips that i just kept in [17:26] one is at the is a SNL clip [17:26] *at the end [17:27] one at the beginning is from fundraising drive on a pbs like channel [17:32] SketchCow: anyways don't upload any of my capture for at least 24 hours [17:49] *** Odd0002 has quit IRC (Quit: ZNC - http://znc.in) [17:52] *** Odd0002 has joined #archiveteam-bs [17:56] *** RichardG_ has quit IRC (Read error: Connection reset by peer) [17:56] *** RichardG has joined #archiveteam-bs [18:53] can some1 tell me how to convert my code into functions, classes and methods? i want it to be clean and easily customizable [18:54] Python -> https://pastebin.com/raw/tXpx9nA7 [18:55] i want to call class.findurl class.gatherdomain class.verbose (what only do print(variable)) [18:56] the problem is for lines in file do: gatherdomain, check with ignorefile [20:28] SketchCow: i'm hoping after i'm done with this box i can a box of short length tapes [20:28] like the T-30 or T-60 tapes [20:29] i say that cause i could go thur a ton of those and get them digitize [20:29] also it will be less editing need for those tapes too [20:54] *** RichardG_ has joined #archiveteam-bs [20:54] *** RichardG has quit IRC (Read error: Connection reset by peer) [20:54] Uzerus: what does it do? [20:55] and how many items/lines do you expect donefile to have? [20:57] for ignoring i would suggest you read the ignore file once, convert it to a set and then simply do something like "if logfiledomain in ignorefiledomains" [20:57] same for donefile unless that is going to be _massive_ [20:57] (or if you want to be able to change things external of the script while it runs) [20:58] i would open the gzipped file and the resultsfile at once, see eg http://legacy.python.org/dev/peps/pep-0008/#maximum-line-length as example [20:59] i would not start with classes with something small like this, unless you are in the OOM mindset already and find it easy to port [21:00] *** RichardG_ has quit IRC (Read error: Connection reset by peer) [21:01] *** BlueMaxim has joined #archiveteam-bs [21:02] *** RichardG has joined #archiveteam-bs [21:09] ignorefile is going to be massive, for now is not, but it will be 1k+ lines [21:10] schbirid: also i planned to make ignore as some kind of regex, but not now, [21:10] first i must have clear working module, to avoid trash in my script (even now i sometimes don't know where i am) [21:11] 1k+ is nothing, 100k would not be much [21:11] if it'll be working, ill publish it on github and ill work on futures [21:12] i have something on mind, but it's a little... complicated for me [21:15] eg, i want classes that do class.finddomain(file, line) class.ignores class.idk [21:16] every class should be for 1 kind of job, one operation, so if i want to change something in that, it will be simpler, cos just input and output needs to be the same [21:16] if its the same, i don't need to change anything other [21:20] i am wrong with that ^ ? i am a little confused, it's my 1st programm ever, first time i am learning classes, practical use of functions etc [21:27] *** Darkstar has quit IRC (Ping timeout: 246 seconds) [21:31] So I finally got a floppy drive, and it writes at an amazing 8.9 kB/s. [21:54] *** Darkstar has joined #archiveteam-bs [21:58] *** pizzaiolo has quit IRC (Remote host closed the connection) [22:00] *** pizzaiolo has joined #archiveteam-bs [23:01] so i'm doing the 'art bell audio only' tape [23:01] yes some one recorded art bell on a vhs tape from a radio somehow :P [23:02] that wasn't hard to do and was the most cost effective way to set a timer to record a 5 hour radio show [23:03] you just don't hook up the yellow socket [23:04] *** jacketcha has quit IRC (Read error: Connection reset by peer) [23:20] *** pizzaiolo has quit IRC (pizzaiolo) [23:35] *** jacketcha has joined #archiveteam-bs [23:57] *** ReimuHaku has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)