#archiveteam-bs 2018-01-22,Mon

↑back Search

Time Nickname Message
00:03 🔗 pizzaiolo has quit IRC (pizzaiolo)
00:09 🔗 godane i'm starting to upload more funny or die videos
00:13 🔗 jacketcha That reminds me of something
00:13 🔗 jacketcha Memes have become a huge part of internet culture, but there is no team dedicated to it
00:13 🔗 JAA I think 4chan is quite dedicated to memes.
00:13 🔗 JAA ;-)
00:14 🔗 JAA Sorry, the hacker called 4chan*
00:14 🔗 jacketcha Wait, who is this Four chan?
00:18 🔗 JAA On a serious note, what would that team do? Maybe we should make a better effort at archiving communities like 4chan and Reddit and archive meme databases like KYM, but other than that, I don't see how a specific team for that is needed. And the 4chan/Reddit thing would obviously cover much more than just memes. A general "social media" team would probably make more sense.
00:19 🔗 jacketcha A team dedicated to all of social media instead of memes would be much more efficient
00:21 🔗 JAA The problem with social media is that they're fucking massive. Jason said the other day that IA is not prepared to archive all of Twitter (now that the Library of Congress stopped doing so), and that's just one network of many (though a large one, obviously).
00:24 🔗 BlueMaxim has joined #archiveteam-bs
00:50 🔗 kisspunch How big is twitter's main feed?
00:55 🔗 kisspunch Looks like text-only is maybe a gig a day?
00:55 🔗 JAA No way.
00:55 🔗 kisspunch 100 gigs, bad math
00:56 🔗 JAA Yeah, that seems closer, but that'd be just the actual texts, no metadata etc.
00:56 🔗 kisspunch Mayb 2-3x with metadata, yeah
00:56 🔗 kisspunch That's reasonably hefty
00:57 🔗 kisspunch github is growing at similar rates
00:57 🔗 kisspunch around the TB/day OOM
00:58 🔗 kisspunch I'll only be able to do a one-off pass
00:58 🔗 JAA Yeah, grabbing newly published tweets is probably feasible. The historical tweets, however...
00:58 🔗 JAA And then there's images and videos.
00:58 🔗 JAA No idea how many of those there are.
01:04 🔗 jacketcha are you guys having storage problems?
01:04 🔗 jacketcha I have a fancy unlimited school google drive account to abuse
01:05 🔗 jacketcha I've already thrown multiple terabytes at it
01:06 🔗 JAA Yeah, that's a brilliant solution if you don't care about how long your data will be around.
01:14 🔗 kisspunch If I wanted my stuff to be around as long as someone else wanted to keep it up, I'd keep it where it is
01:14 🔗 kisspunch Please do keep archiving to it, but not what I want
01:16 🔗 JAA Sure, it's better than not doing anything at all.
01:20 🔗 dashcloud has quit IRC (Ping timeout: 492 seconds)
01:23 🔗 dashcloud has joined #archiveteam-bs
01:46 🔗 Arctic has joined #archiveteam-bs
01:46 🔗 Arctic join #archiveteam-ot
02:34 🔗 Stilett0 has joined #archiveteam-bs
03:38 🔗 ld1 has quit IRC (Quit: ld1)
03:41 🔗 ld1 has joined #archiveteam-bs
04:25 🔗 fireglow has quit IRC (Quit: Gnothi seauton; Veritas vos liberabit)
04:25 🔗 Pixi has quit IRC (Quit: Pixi)
04:25 🔗 Pixi has joined #archiveteam-bs
04:30 🔗 fireglow has joined #archiveteam-bs
04:54 🔗 qw3rty117 has joined #archiveteam-bs
04:56 🔗 Vito` Could do what regular archivists do and sample or otherwise do pointed, strategic grabs, like core samples. All of Twitter, but only two randomly selected days each month, maybe 8TB. Still enables good research afterwards, even if it's only representative and not comprehensive.
04:58 🔗 Vito` For GitHub, maybe one effort to grab all the repos updated on a given day, another to find and grab the oldest and most unmaintained repos as one-time grabs.
04:58 🔗 godane SketchCow: i'm starting to capture your tapes to sent me last week
04:58 🔗 godane first one is the IBM tape
05:00 🔗 qw3rty116 has quit IRC (Read error: Operation timed out)
05:24 🔗 Arctic Hello.
05:31 🔗 Arctic has quit IRC (Quit: Page closed)
05:54 🔗 BlueMaxim Hello, goodbye
06:23 🔗 robink has joined #archiveteam-bs
06:28 🔗 Mateon1 has quit IRC (Ping timeout: 255 seconds)
06:28 🔗 Mateon1 has joined #archiveteam-bs
07:36 🔗 Mateon1 has quit IRC (Read error: Connection reset by peer)
07:36 🔗 Mateon1 has joined #archiveteam-bs
07:44 🔗 schbirid has joined #archiveteam-bs
08:31 🔗 zino has quit IRC (Read error: Operation timed out)
08:36 🔗 pizzaiolo has joined #archiveteam-bs
08:40 🔗 godane so one of the tapes had Unsolved Mysteries episode but was taped over to put mtv music videos and 20 years of monty python
08:41 🔗 godane i only have a problem cause there a parts of unsolved mysteries but no complete segments
09:37 🔗 zino has joined #archiveteam-bs
09:56 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
10:07 🔗 Stilett0 has joined #archiveteam-bs
10:07 🔗 Stiletto has joined #archiveteam-bs
10:45 🔗 Darkstar has quit IRC (Ping timeout: 246 seconds)
10:51 🔗 atrocity has quit IRC (Read error: Operation timed out)
11:02 🔗 Darkstar has joined #archiveteam-bs
11:39 🔗 Stilett0 has quit IRC ()
11:42 🔗 BlueMaxim has quit IRC (Leaving)
12:21 🔗 schbirid has quit IRC (Quit: Leaving)
13:11 🔗 pizzaiolo has quit IRC (Read error: Operation timed out)
13:14 🔗 pizzaiolo has joined #archiveteam-bs
13:56 🔗 Uzerus Ivan: look @ github, grab-site.
13:57 🔗 Uzerus added description of installation in centos7
14:07 🔗 RichardG_ has joined #archiveteam-bs
14:09 🔗 atrocity has joined #archiveteam-bs
14:09 🔗 RichardG has quit IRC (Ping timeout: 264 seconds)
14:12 🔗 RichardG_ is now known as RichardG
14:44 🔗 schbirid has joined #archiveteam-bs
15:14 🔗 ranavalon has quit IRC (Read error: Connection reset by peer)
15:16 🔗 RichardG_ has joined #archiveteam-bs
15:16 🔗 ranavalon has joined #archiveteam-bs
15:17 🔗 ranavalon has quit IRC (Remote host closed the connection)
15:17 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
15:17 🔗 ranavalon has joined #archiveteam-bs
16:35 🔗 Arctic has joined #archiveteam-bs
16:45 🔗 Arctic has quit IRC (Quit: Page closed)
16:46 🔗 godane SketchCow: so on of the news files is going be called MTV Music Videos taped Over Unsolved Mysteries 198x.mpg or something
16:47 🔗 godane thats cause there are clips of Unsolved Mysteries that are still on the tape but nothing worth saving as a separate file
16:49 🔗 SketchCow Sounds right
17:02 🔗 godane i was sort of said when i noticed mtv music videos was taped over a unsolved mysteries episode
17:03 🔗 godane i was at least hoping to get a complete block of it
17:24 🔗 godane so tape is now edited
17:25 🔗 godane i couldn't figure out the station it aired on though
17:26 🔗 godane also there is a 2 clips that i just kept in
17:26 🔗 godane one is at the is a SNL clip
17:26 🔗 godane *at the end
17:27 🔗 godane one at the beginning is from fundraising drive on a pbs like channel
17:32 🔗 godane SketchCow: anyways don't upload any of my capture for at least 24 hours
17:49 🔗 Odd0002 has quit IRC (Quit: ZNC - http://znc.in)
17:52 🔗 Odd0002 has joined #archiveteam-bs
17:56 🔗 RichardG_ has quit IRC (Read error: Connection reset by peer)
17:56 🔗 RichardG has joined #archiveteam-bs
18:53 🔗 Uzerus can some1 tell me how to convert my code into functions, classes and methods? i want it to be clean and easily customizable
18:54 🔗 Uzerus Python -> https://pastebin.com/raw/tXpx9nA7
18:55 🔗 Uzerus i want to call class.findurl class.gatherdomain class.verbose (what only do print(variable))
18:56 🔗 Uzerus the problem is for lines in file do: gatherdomain, check with ignorefile
20:28 🔗 godane SketchCow: i'm hoping after i'm done with this box i can a box of short length tapes
20:28 🔗 godane like the T-30 or T-60 tapes
20:29 🔗 godane i say that cause i could go thur a ton of those and get them digitize
20:29 🔗 godane also it will be less editing need for those tapes too
20:54 🔗 RichardG_ has joined #archiveteam-bs
20:54 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
20:54 🔗 schbirid Uzerus: what does it do?
20:55 🔗 schbirid and how many items/lines do you expect donefile to have?
20:57 🔗 schbirid for ignoring i would suggest you read the ignore file once, convert it to a set and then simply do something like "if logfiledomain in ignorefiledomains"
20:57 🔗 schbirid same for donefile unless that is going to be _massive_
20:57 🔗 schbirid (or if you want to be able to change things external of the script while it runs)
20:58 🔗 schbirid i would open the gzipped file and the resultsfile at once, see eg http://legacy.python.org/dev/peps/pep-0008/#maximum-line-length as example
20:59 🔗 schbirid i would not start with classes with something small like this, unless you are in the OOM mindset already and find it easy to port
21:00 🔗 RichardG_ has quit IRC (Read error: Connection reset by peer)
21:01 🔗 BlueMaxim has joined #archiveteam-bs
21:02 🔗 RichardG has joined #archiveteam-bs
21:09 🔗 Uzerus ignorefile is going to be massive, for now is not, but it will be 1k+ lines
21:10 🔗 Uzerus schbirid: also i planned to make ignore as some kind of regex, but not now,
21:10 🔗 Uzerus first i must have clear working module, to avoid trash in my script (even now i sometimes don't know where i am)
21:11 🔗 schbirid 1k+ is nothing, 100k would not be much
21:11 🔗 Uzerus if it'll be working, ill publish it on github and ill work on futures
21:12 🔗 Uzerus i have something on mind, but it's a little... complicated for me
21:15 🔗 Uzerus eg, i want classes that do class.finddomain(file, line) class.ignores class.idk
21:16 🔗 Uzerus every class should be for 1 kind of job, one operation, so if i want to change something in that, it will be simpler, cos just input and output needs to be the same
21:16 🔗 Uzerus if its the same, i don't need to change anything other
21:20 🔗 Uzerus i am wrong with that ^ ? i am a little confused, it's my 1st programm ever, first time i am learning classes, practical use of functions etc
21:27 🔗 Darkstar has quit IRC (Ping timeout: 246 seconds)
21:31 🔗 JAA So I finally got a floppy drive, and it writes at an amazing 8.9 kB/s.
21:54 🔗 Darkstar has joined #archiveteam-bs
21:58 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
22:00 🔗 pizzaiolo has joined #archiveteam-bs
23:01 🔗 godane so i'm doing the 'art bell audio only' tape
23:01 🔗 godane yes some one recorded art bell on a vhs tape from a radio somehow :P
23:02 🔗 xarph that wasn't hard to do and was the most cost effective way to set a timer to record a 5 hour radio show
23:03 🔗 xarph you just don't hook up the yellow socket
23:04 🔗 jacketcha has quit IRC (Read error: Connection reset by peer)
23:20 🔗 pizzaiolo has quit IRC (pizzaiolo)
23:35 🔗 jacketcha has joined #archiveteam-bs
23:57 🔗 ReimuHaku has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)

irclogger-viewer