#archiveteam-ot 2019-08-07,Wed

↑back Search

Time Nickname Message
00:00 🔗 DogsRNice has joined #archiveteam-ot
00:08 🔗 sHATNER has quit IRC (hub.efnet.us irc.efnet.nl)
00:08 🔗 Tenebrae has quit IRC (hub.efnet.us irc.efnet.nl)
00:08 🔗 Fusl has quit IRC (hub.efnet.us irc.efnet.nl)
00:08 🔗 Dallas has quit IRC (hub.efnet.us irc.efnet.nl)
00:08 🔗 jodizzle has quit IRC (hub.efnet.us irc.efnet.nl)
00:08 🔗 MrRadar2 has quit IRC (hub.efnet.us irc.efnet.nl)
00:08 🔗 BnAboyZ has quit IRC (hub.efnet.us irc.efnet.nl)
00:08 🔗 nyaomi has quit IRC (hub.efnet.us irc.efnet.nl)
00:09 🔗 yano has quit IRC (Read error: Operation timed out)
00:10 🔗 schbirid has quit IRC (Read error: Operation timed out)
00:13 🔗 sHATNER has joined #archiveteam-ot
00:13 🔗 Tenebrae has joined #archiveteam-ot
00:13 🔗 Fusl has joined #archiveteam-ot
00:13 🔗 Dallas has joined #archiveteam-ot
00:13 🔗 jodizzle has joined #archiveteam-ot
00:13 🔗 MrRadar2 has joined #archiveteam-ot
00:13 🔗 BnAboyZ has joined #archiveteam-ot
00:13 🔗 nyaomi has joined #archiveteam-ot
00:13 🔗 irc.efnet.nl sets mode: +oo Fusl MrRadar2
00:14 🔗 Fusl_ sets mode: +o Fusl
00:15 🔗 yano has joined #archiveteam-ot
00:18 🔗 schbirid has joined #archiveteam-ot
00:18 🔗 Fusl sets mode: +o kiska1
00:18 🔗 Fusl sets mode: +o kiskabak
00:18 🔗 Fusl sets mode: +o kiska
00:18 🔗 Fusl sets mode: +o JAA
00:18 🔗 Fusl sets mode: +o svchfoo3
00:18 🔗 Fusl sets mode: +o jrwr
00:18 🔗 Fusl sets mode: +o astrid
00:18 🔗 Fusl sets mode: +o dxrt
00:18 🔗 Fusl sets mode: +o chfoo
00:18 🔗 Fusl sets mode: +o svchfoo1
00:18 🔗 Fusl sets mode: +o ivan_
00:19 🔗 Fusl sets mode: +o dxrt_
00:19 🔗 Fusl sets mode: +o AlsoJAA
00:19 🔗 Fusl sets mode: +o arkiver
00:19 🔗 Fusl sets mode: +o Kenshin
00:19 🔗 Fusl sets mode: +o SketchCow
00:19 🔗 Fusl sets mode: +o Fusl_
00:19 🔗 Fusl sets mode: +o hook54321
00:19 🔗 Fusl sets mode: +o Kaz
00:19 🔗 Fusl sets mode: +o HCross
00:24 🔗 Dragnog2 has quit IRC (Quit: Connection closed for inactivity)
02:18 🔗 qw3rty111 has joined #archiveteam-ot
02:24 🔗 qw3rty119 has quit IRC (Read error: Operation timed out)
02:59 🔗 Flashfire has quit IRC (Remote host closed the connection)
02:59 🔗 kiska has quit IRC (Remote host closed the connection)
02:59 🔗 Flashfire has joined #archiveteam-ot
03:00 🔗 kiska has joined #archiveteam-ot
03:00 🔗 Fusl sets mode: +o kiska
03:00 🔗 Fusl_ sets mode: +o kiska
03:29 🔗 qw3rty112 has joined #archiveteam-ot
03:35 🔗 qw3rty111 has quit IRC (Read error: Operation timed out)
03:37 🔗 killsushi has quit IRC (Quit: Leaving)
03:51 🔗 odemg has quit IRC (Read error: Operation timed out)
04:06 🔗 odemg has joined #archiveteam-ot
04:07 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
05:48 🔗 godane has quit IRC (Ping timeout: 506 seconds)
05:54 🔗 Dragnog2 has joined #archiveteam-ot
06:01 🔗 killsushi has joined #archiveteam-ot
07:08 🔗 schbirid has quit IRC (Read error: Operation timed out)
08:20 🔗 bluefoo has quit IRC (Read error: Operation timed out)
08:41 🔗 godane has joined #archiveteam-ot
09:24 🔗 Dragnog2 has quit IRC (Quit: Connection closed for inactivity)
09:45 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
11:24 🔗 bluefoo has joined #archiveteam-ot
13:02 🔗 VerifiedJ has joined #archiveteam-ot
13:05 🔗 bluefoo has quit IRC (Ping timeout: 615 seconds)
15:45 🔗 Dragnog2 has joined #archiveteam-ot
16:27 🔗 schbirid has joined #archiveteam-ot
17:36 🔗 Dj-Wawa has quit IRC (Quit: Connection closed for inactivity)
17:41 🔗 systwi_ is now known as systwi
17:49 🔗 bluefoo has joined #archiveteam-ot
18:18 🔗 Dj-Wawa has joined #archiveteam-ot
18:24 🔗 yano https://www.phoronix.com/scan.php?page=news_item&px=CVE-2019-1125-SWAPGS
18:28 🔗 nataraj has joined #archiveteam-ot
18:38 🔗 systwi Is there anywhere I can find a list of network protocols? I'm trying to find ones like http:// https:// ftp:// etc. I'm writing a script & need to check if any of those have a . in them
18:39 🔗 systwi And before you mention it I did look it up earlier
18:40 🔗 systwi I was just having a difficult time finding a list that explicitly stated if a protocol can or cannot have a "." in the name
18:43 🔗 Fusl systwi: not really. you can probably find a list of common URL protocols but since technically any arbritary URL protocol can be registered and the database be extended, there can always be others
18:43 🔗 Fusl https://www.mediawiki.org/wiki/Manual:$wgUrlProtocols is a good list
18:49 🔗 nataraj has quit IRC (Read error: Operation timed out)
18:55 🔗 coderobe systwi: according to the spec it can be any ascii string https://url.spec.whatwg.org/#url-representation
18:56 🔗 coderobe it cannot be null but it can be an empty string too
19:00 🔗 h3ndr1k Git can work with ssh+git://
19:01 🔗 h3ndr1k Better use a proper lib to parse the url and escape all chars you can't handle
19:05 🔗 deevious has quit IRC (Ping timeout: 252 seconds)
19:19 🔗 Stiletto has quit IRC (Read error: Connection reset by peer)
19:28 🔗 Stiletto has joined #archiveteam-ot
19:28 🔗 systwi Ugh, well that's going to be difficult. See, in my script I need it to accept urls like: https://www.example.com/ , but also urls like www.example.com
19:29 🔗 systwi And to filter out something that's junk, i.e. akjh.sf.dsfa://www.example.com/
19:31 🔗 h3ndr1k Technical that is not junk. But if you cant handle some schemes you should just ignore them. And when there is no scheme you have to define a default one for your program
19:32 🔗 h3ndr1k urls are hard to handle universally
19:32 🔗 systwi There might be a *nix program I can pass the URL into but I don't know of any
19:33 🔗 h3ndr1k I remember a video of a talk, where 3 different python libraries parsed 3 different host information from the same specially crafted url
19:34 🔗 systwi I guess I can just do that, blacklist all protocols that I know of that I don't want to use, and then for unrecognized ones tack on "https://" to the beginning and if it doesn't work, then report to the user that the URL is bad
19:34 🔗 h3ndr1k I dont quite understand your problem. A url parsing lib should be all you need.
19:35 🔗 systwi I'm sorry I don't understand that. This is a shell script btw
19:37 🔗 h3ndr1k Uh ok. Kind of hard to do that with shell. Maybe a small python script which parses a url and returns all parts (scheme, host, etc) line separated?
19:37 🔗 h3ndr1k Whatever, you will figure something out.
19:37 🔗 systwi Sounds like a good idea except for the fact I know nothing about python :-/ But yeah I will keep looking around, thanks
19:50 🔗 h3ndr1k systwi: This might help you: https://pastebin.com/N7xiCErW
19:51 🔗 h3ndr1k Thats how you would parse a url in python. You can then print whatever properties you want and use then in your shell script
19:55 🔗 VerifiedJ has quit IRC (Quit: Leaving)
20:04 🔗 systwi Thanks h3ndr1k I'll give that a try
20:05 🔗 h3ndr1k np
20:13 🔗 systwi So, I'm probably doing it wrong, but I saved that pastebin content into a .py file and ran it in my Terminal like so: $ ./url_test.py
20:14 🔗 systwi That gave an exit code of 1 which would make sense since there is no URL
20:14 🔗 systwi Trying: $ ./url_test.py irc://example.com
20:14 🔗 systwi Outputted some text to the screen giving an exit code of 0.
20:15 🔗 systwi The exact same result happen with http://example.com
20:15 🔗 systwi :-/
20:24 🔗 h3ndr1k Yeah it is just a template. In the moment there is a constant string passed to the parser instead of the argument
20:25 🔗 h3ndr1k You just have to replace 'ab...://......' with url
20:26 🔗 h3ndr1k systwi: ^ so that you have urlparse(url) in there
20:27 🔗 systwi Ok thanks let my try that. Is there a way I can pass argument 1 into the (url) place? e.g. in bash it would be "$1"
20:28 🔗 h3ndr1k sys.argv[1] is exactly $1
20:28 🔗 h3ndr1k And the line url=sys.argv[1] essentially asigns $1 to $url (In shell terms :) )
20:29 🔗 systwi Ohhh okay cool let me try that
20:30 🔗 systwi Thanks for the shell terms btw makes more sense to me :)
20:31 🔗 h3ndr1k :)
20:48 🔗 systwi It looks like it's working well so far, however I tried filling in a junk input ( $ ./url_test.py htt://shttp:example.com/ ) and it kinda killed python:
20:49 🔗 systwi $ ./url.py htt://shttp:example.com/
20:49 🔗 systwi htt
20:49 🔗 systwi shttp
20:49 🔗 systwi Traceback (most recent call last):
20:49 🔗 systwi File "./url.py", line 16, in <module>
20:49 🔗 systwi print(parsed.port)
20:49 🔗 systwi File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/parse.py", line 169, in port
20:49 🔗 systwi port = int(port, 10)
20:49 🔗 systwi ValueError: invalid literal for int() with base 10: 'example.com'
20:49 🔗 systwi That's okay, though, I can just check the user input beforehand to make sure a bad url like that isn't passed to url_test.py
21:04 🔗 h3ndr1k systwi: We could wrap the parsing in a try-catch-block and exit with 1 in case of such an error. But Im about to go to bed now. I may find something tomorrow.
21:05 🔗 h3ndr1k The error says, that the port (after the colon) is not a number btw.
21:05 🔗 systwi No worries, see you tomorrow, thanks!
21:19 🔗 yano i found this in my irc logs from a few years ago; it seems to be a collection of over a billion reddit comments from 2007 to 2015 according to the readme, magnet:?xt=urn:btih:7690f71ea949b868080401c749e878f98de34d3d&dn=reddit_data
21:20 🔗 systwi yano: What format?
21:20 🔗 yano "Each file is compressed with bzip2 compression. When uncompressed, each file
21:20 🔗 yano is a series of JSON blocks delimited by new lines (\n). The name of each file
21:20 🔗 yano follows the format RC_yyyy-mm.bz2 where yyyy is the year and mm is the month.
21:20 🔗 yano RC stands for "reddit comments.""
21:21 🔗 yano this is the readme file, https://0bin.net/paste/qzH7ebO+eKGBGwKn#6m7gsYDKUNcX0pbKg56Sfh4qLGWfuC2IEEBz26HnSva
21:22 🔗 yano it's about 150 GB
21:23 🔗 Jens has quit IRC (Remote host closed the connection)
21:24 🔗 Jens has joined #archiveteam-ot
23:54 🔗 Dragnog2 has quit IRC (Quit: Connection closed for inactivity)
23:57 🔗 Flashfire has quit IRC (Remote host closed the connection)
23:57 🔗 kiska has quit IRC (Remote host closed the connection)
23:57 🔗 Flashfire has joined #archiveteam-ot
23:58 🔗 kiska has joined #archiveteam-ot
23:58 🔗 Fusl sets mode: +o kiska
23:58 🔗 Fusl_ sets mode: +o kiska

irclogger-viewer